WO2021138783A1 - Data processing method and apparatus, and computer readable storage medium - Google Patents

Data processing method and apparatus, and computer readable storage medium Download PDF

Info

Publication number
WO2021138783A1
WO2021138783A1 PCT/CN2020/070549 CN2020070549W WO2021138783A1 WO 2021138783 A1 WO2021138783 A1 WO 2021138783A1 CN 2020070549 W CN2020070549 W CN 2020070549W WO 2021138783 A1 WO2021138783 A1 WO 2021138783A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
training
training model
model
data processing
Prior art date
Application number
PCT/CN2020/070549
Other languages
French (fr)
Chinese (zh)
Inventor
薛冰
徐升
张永耿
Original Assignee
深圳市微蓝智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市微蓝智能科技有限公司 filed Critical 深圳市微蓝智能科技有限公司
Priority to CN202080005484.2A priority Critical patent/CN112805725A/en
Priority to PCT/CN2020/070549 priority patent/WO2021138783A1/en
Publication of WO2021138783A1 publication Critical patent/WO2021138783A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a data processing method and device, and a computer-readable storage medium.
  • Artificial Intelligence is a new technological science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
  • the research fields of artificial intelligence include robotics, language recognition, image recognition, natural speech processing, and expert systems.
  • the realization of artificial intelligence mostly relies on training models, which are models generated through independent learning on a large amount of training data.
  • the type of data may include, but is not limited to, images, videos, audios, objects, texts, and so on.
  • the embodiments of the present application disclose a data processing method and device, and a computer-readable storage medium, which can efficiently and conveniently generate and deploy a training model.
  • an embodiment of the present application provides a data processing method, which includes:
  • an embodiment of the present application provides a data processing device, the data processing device includes: an input unit, a processing unit, and an output unit;
  • the input unit is used to obtain a first data set for the target item
  • the processing unit is configured to generate a first labeled data set corresponding to the first data set according to the identification purpose of the target item; perform model training on the first labeled data set to generate a first training model;
  • the output unit is configured to output the first training model when the processing unit determines that the accuracy information of the first training model satisfies a deployment condition.
  • an embodiment of the present application provides a data processing device, including a processor and a memory, the memory is used to store computer instructions, and when the processor executes the computer instructions, the data processing device Perform the method described in the first aspect above.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program that can implement the method described in the first aspect when the computer program is executed by a processor.
  • data collection can be achieved by acquiring the first data set for the target project; data labeling can be realized by generating the first labeled data set; model generation can be realized by performing model training on the first labeled data set ; Model deployment can be realized by judging whether the training model meets the deployment conditions; thus, the training model can be generated and deployed efficiently, conveniently and flexibly.
  • Managing the series of processes of data collection, data labeling, model generation and model deployment in the form of projects can improve the convenience and feasibility of training model applications.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scene of a data processing method provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of an interface for creating a project provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interface of a certain plan provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a terminal device disclosed in an embodiment of the present application.
  • the training model refers to the model obtained by adaptively adjusting the selected algorithm (or called autonomous learning) through a large amount of training data.
  • the training model can be applied to machine learning, language recognition, image recognition and other fields.
  • the training model is applied to the field of image recognition, which can realize the recognition of cats or dogs in the image.
  • a data set refers to an unlabeled data set, which can include one or more unlabeled data.
  • Unlabeled data means that there is no trace of labeling on the data. For example, there is no mark on the image.
  • Annotated data set refers to an annotated data set, which can include one or more annotated data.
  • Labeled data that is, there are traces of labeling on the data. For example, there are marking traces on the image.
  • the data is introduced by taking an image as an example.
  • FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of this application.
  • the network architecture scene may include a user, a data processing device 101, and an application device 102.
  • the data processing device 101 may be a terminal device, or a device matched with the terminal device, such as a processor.
  • Terminal devices may include, but are not limited to, personal computers (PC), notebook computers, tablet computers, smart phones (such as Android phones, etc.), mobile Internet devices (Mobile Internet Devices, MID), and so on.
  • the data processing device 101 has the ability to generate a training model.
  • the application device 102 may include, but is not limited to, robots, aircraft, automobiles, smart home appliances, wearable devices, virtual reality (VR) devices, surveillance camera devices, smart phones, tablet computers, MIDs, PCs and other devices.
  • the application device 102 has the ability to deploy a training model.
  • the data processing device 101 shown in FIG. 1 takes a PC as an example, and the application equipment 102 takes a smart robot, a car, and an aircraft as examples.
  • the shape and quantity of each device are for example, and do not constitute a limitation to the embodiment of the present application.
  • the data processing device 101 can create a project according to the project creation instruction input by the user.
  • the project can be an artificial intelligence (AI) project.
  • AI artificial intelligence
  • the AI project can be used to achieve a certain purpose, for example, to achieve image matching. Recognition of cats or dogs, etc.
  • the data processing device 101 may generate a labeled data set corresponding to the data set according to the identification purpose of the item, perform model training on the labeled data set, and generate a training model.
  • the data processing device 101 can output the training model, and can output the training model to the application device 102 to deploy the training model on the application device 102, so that the application device 102 can achieve the identification purpose .
  • the data processing device 101 may also deploy the training model by itself, that is, deploy the training model on the data processing device 101, so that the data processing device 101 can achieve the identification purpose.
  • the data processing device 101 may also output the training model to a third-party platform, so as to deploy the training model on the third-party platform.
  • the third-party platform can be an image recognition platform or an AI visual recognition platform, etc.
  • the data processing device 101 receives the project creation instruction to create the AI item 1, it creates the AI item 1, and the AI item 1 is used to recognize the puppy A.
  • the data processing device 101 receives an image set for AI item 1 (for example, including 20 images), it generates an annotated image set corresponding to the image set according to the identified puppy A, and an image of puppy A is attached to the annotated image set. Is marked (for example, puppy A in a certain image is circled by a dotted line).
  • the data processing device 101 performs model training on the labeled image set, and generates a training model. If the training model meets the deployment conditions, the data processing device 101 can output the training model to the surveillance camera equipment, so that the surveillance camera equipment can realize the Recognition of dog A.
  • the embodiments of the present application can be applied to research and development scenarios, test scenarios, and usage scenarios, so that the embodiments of the present application have a wide range of applications. Even users who do not know the training model can also use the embodiments of the present application to control the application device 102 or the data processing device 101 to realize the identification of the designated item.
  • FIG. 2 is a schematic flowchart of a data processing method provided by an embodiment of this application. As shown in Figure 2, the data processing method includes but is not limited to the following steps:
  • Step 201 The data processing device obtains the first data set of the target item.
  • the target item can be any AI item, for example, an AI item for recognizing human faces, an AI item for recognizing specific creatures, or an AI item for users to recognize parts defects. Different items can have different identification purposes.
  • the user can input a project creation instruction for the user display interface of the data processing device, and the data processing device can create a project according to the project creation instruction.
  • the first data set of the target item can be understood as a data set used to achieve the identification purpose of the target item.
  • the specific number is not limited in the embodiment of the present application, for example, 20 images.
  • the first data set may be an initial data set, that is, a data set initially obtained, or a data set after subsequent adjustments.
  • the data processing device may receive a data upload instruction input by the user for the target project, and the data upload instruction may carry the first data set for uploading the first data set to the data processing Device, so that the data processing device obtains the first data set.
  • the data processing device may obtain the first data set for the target item through the data collection device.
  • the data collection device can be a camera of a terminal device or a surveillance camera device. For example, after taking a series of photos of puppy A, the camera transmits these photos of puppy A to the data processing device.
  • one project can correspond to multiple plans, one plan can correspond to one data set, and one training model can be obtained by executing one plan.
  • data set 1 is a data set of plan 1 for the target project, and execution plan 1 can get training model 1
  • data set 2 is a data set of plan 2 for the target project, and execution plan 2 can get training model 2.
  • Step 202 The data processing device generates a first annotation data set corresponding to the first data set according to the identification purpose of the target item.
  • the identification purpose of the target item may include item type and label type.
  • Item types can include, but are not limited to, target detection, object segmentation, etc.
  • Target detection is used to detect targets, such as detecting pets or human faces
  • object segmentation is used to segment or separate objects, such as combining people and scenes in an image. Perform segmentation, etc.
  • Marking types can include, but are not limited to, internal chamfering, foreign matter, plane repair, etc. The marking types can be added or deleted or modified by the user, or they can be provided by the system.
  • the first labeled data set refers to a data set in which the first data in the first data set is labeled.
  • the number of the first annotated data set can be the same as the number of the first data set.
  • the first data set includes 100 photos, and the first annotated data set also includes 100 photos.
  • the 100 photos include photos of puppy A. Is marked.
  • the data processing device may generate the first annotation data set by calling the annotation model. Specifically, according to the identification purpose of the target item, the data processing device calls the labeling model to perform labeling processing on the first data in the first data set to generate the first labeling data set.
  • the labeling model is a training model that has been trained, meets the requirements, and can be used directly to implement labeling of data.
  • the recognition purpose of the target item is to detect puppy A
  • the data processing device calls the labeling model to label each image in 20 images (that is, the first data set) according to the purpose of detecting puppy A, and the image is labeled Puppy A in, thus get 20 annotated images (that is, the first annotated data set).
  • Generating the first annotation data set by calling the annotation model can save workload and improve annotation efficiency.
  • the data processing device may generate the first annotation data set through an annotation instruction. Specifically, the data processing device generates the first annotation data set according to the identification purpose of the target item and according to the annotation instruction for the first data in the first data set. That is, the user can input a labeling instruction for each first data in the first data set.
  • the labeling instruction is related to the recognition purpose. If the recognition purpose is target detection, the labeling instruction can be to circle the object to be recognized, etc.; if the recognition purpose is an object For segmentation, the labeling instruction may be a dividing line for dividing the object. After the user confirms the labeling instruction of a certain data, the data processing device may save the labelled data, and then generate the first labeling data set.
  • the data processing device For example, if the purpose of identifying the target item is to detect puppy A, then the data processing device according to the purpose of detecting puppy A, according to the user's annotation instructions for each image in the 20 images (ie the first data set), save the annotations Of 20 images (that is, the first annotated data set).
  • the accuracy of the first annotation data set is higher by the way of labeling instructions than by calling the labeling model.
  • the data processing device may also generate the first annotation data set by calling the annotation model and combining the annotation instructions. For example, according to the identification purpose of the target item, the data processing device first calls the labeling model to label the first data in the first data set, and then adjusts according to the labeling instructions input by the user, and finally generates the first labeling data set. This combination method can further improve the accuracy of the first annotation data set.
  • the user can check the first labeled data set to ensure the accuracy of the first labeled data set.
  • the user can input at least one confirmation instruction for the first annotated data set. If the data processing device receives at least one confirmation instruction, it can be determined that the first annotated data set meets the training condition, and step 203 can be executed.
  • each of the at least one confirmation instruction can be from a user with different authority, for example, two confirmation instructions, one from the user operating the terminal device, one from the system inspector or administrator; or the first one from the user with the first authority
  • the second time comes from a user with the second authority, the level of the second authority is higher than the first authority.
  • the data processing device may count the number of marked data, the number of invalid data, and the number of marked objects in the first marked data set.
  • the number of labeled data refers to how many data are labeled, for example, how many images out of 100 images are labeled.
  • the amount of invalid data refers to the amount of data that is not related to the purpose of recognition. For example, the purpose of recognition is to detect puppy A, but an image is an image of a building, and the image has nothing to do with puppy A. Then the image It can be considered invalid data.
  • the number of labeled objects refers to how many objects are labeled in the data. For example, if the recognition purpose is object segmentation, then the number of labeled objects is at least two. The data processing device can also output these statistical results for the user to make corresponding adjustments.
  • Step 203 The data processing device performs model training on the first labeled data set to generate a first training model.
  • the data processing device calls the target training model to perform the training on the first labeled data set.
  • Model training to generate the first training model can be understood as an initial model without any training data input, which is used to train the input training data.
  • the target training model can also be described as an artificial intelligence model or a deterministic model.
  • the target training model can be a linear model, a convolutional neural network model, or a recurrent neural network model.
  • the data processing device inputs the first labeled data set as training data into the target training model to perform model training, thereby generating the first training model.
  • the target training model is a convolutional neural network model
  • the data processing device inputs 20 images labeled puppy A into the convolutional neural network model.
  • the convolutional neural network model can be trained through the model to obtain the first training model. Once the training model meets the deployment conditions, the deployment of the first training model can realize the recognition of puppy A.
  • the data processing device may perform model training based on the existing training model to generate the first training model.
  • the existing training model is the training model BB
  • the data processing device can obtain the first data set based on the accuracy information of the training model BB, generate the first labeled data set, and input the first labeled data set into the training model BB to generate the first data set. Train the model. Since the first data set is acquired based on the training model BB, the accuracy of the first training model is higher than that of the training model BB.
  • the data processing device may obtain a data processing server in an idle state, and call the data processing server to perform model training to generate the first training model.
  • the data processing server may include, but is not limited to, a graphics processing (Graphics Processing Unit, GPU) server, a text processing server, a central processing (CPU) server, an application-specific integrated circuit (ASIC) server, Tensor Processing Unit (TPU) server, Neural Network Processing Unit (NPU) server, Field-programmable Gate Array (FPGA) server, etc.
  • the data processing server may be a GPU server, and the embodiment of the present application takes the GPU server as an example for description.
  • the GPU server is a computing service based on graphics applications. It has real-time and high-speed parallel computing and floating-point computing capabilities. It is suitable for application scenarios such as 3D graphics applications, video decoding, deep learning, and scientific computing.
  • the GPU server can be located in the Internet cloud or It may be mounted in a data processing device in the form of a GPU processor.
  • Step 204 The data processing device outputs the first training model when the accuracy information of the first training model meets the deployment condition.
  • the data processing device When the data processing device obtains the first training model, it can judge whether the accuracy information of the first training model meets the deployment condition by referring to the data set. Specifically, the data processing device calls the first training model to test the reference data set, obtains the first test result, and compares the first test result with the reference annotation result corresponding to the reference data set to obtain the accuracy information of the first training model . The data processing device calls the first training model to test the reference data set, that is, the reference data set is input to the first training model for target detection or object segmentation.
  • the reference data set can also be described as a test data set, etc.
  • the reference data set is a data set carefully selected by the user to meet the identification purpose of the target item, and is an unlabeled data set.
  • the first test result (assumed to be R1) refers to the labeling result output by the first training model after the reference data set is input to the first training model.
  • the number of data included in the first test result can be the same as the number of data included in the reference data set the same.
  • the reference labeling result (assuming R0) can also be described as a test labeling data set, etc.
  • the reference labeling result is a labeling data set that the user manually annotates the reference data set and meets the identification purpose of the target item.
  • the result of reference labeling has been confirmed many times, and its labeling accuracy is high, and it is used to detect whether the generated training model meets the deployment conditions. It is understandable that the reference data set is the test paper, and the reference marked result is the reference marked answer of the test paper.
  • the data processing device compares the first test result (R1) with the reference labeling result (R0) one by one to obtain accuracy information of the first training model.
  • the accuracy information may include the overall accuracy and the accuracy of the judgment item.
  • the overall accuracy refers to the accuracy of the judgment of the first training model from the overall perspective.
  • the accuracy of the evaluation item refers to the accuracy of the first training model from the perspective of a specific index.
  • the specific index can be one or more, and a specific index corresponds to the accuracy of a judgment item.
  • Specific indicators may include, but are not limited to: recognition type, misrecognition, missed recognition, wrong recognition, multiple recognition, etc.
  • the accuracy of the judgment item corresponding to the recognition type represents the recognition accuracy of the recognition type.
  • the purpose of the recognition is to identify cats and dogs, then there are two types of recognition, the accuracy of the judgment item corresponding to the recognition of the cat and the accuracy of the judgment item corresponding to the recognition of the dog degree.
  • Misrecognition means that compared with R0, there are cases in R1 that should not be marked.
  • the purpose of recognition is to identify kitten B. In R0, kitten B is marked, but the image of kitten C in R1 is regarded as If the kitten B is labeled, it is considered that the image in R1 is misrecognized.
  • Missing recognition means that compared with R0, there is an unlabeled situation in R1 that should be labeled.
  • the purpose of recognition is to identify kitten B. Kitten B is labeled in R0, but there is a certain image in R1 that includes a kitten. B. If the kitten B in the image is not labeled, it is considered that the image in R1 has missed recognition.
  • Wrong recognition means that compared with R0, there is a wrong recognition type in R1. For example, the purpose of recognition is to recognize kitten B.
  • Kitten B is labeled in R0, but the image of kitten B is labeled as puppy in R1. A, it is considered that the image in R1 is misidentified.
  • Multiple recognition means that for a certain image, compared with R0, the recognition results of the same type in R1 are more than the recognition results of that type in R0.
  • the purpose of recognition is to recognize puppy A, and the image in R0 A total of 1 puppy A is marked, but 2 puppy A is marked in R1, it is considered that there is multiple recognition in R1.
  • the higher the accuracy of the evaluation item corresponding to multiple recognition the lower the probability of multiple recognition.
  • the overall accuracy is related to the accuracy of the judgment item.
  • the overall accuracy is the weighted average of the accuracy of each evaluation item, or the average of the accuracy of each evaluation item.
  • the specific indicators include misrecognition and missed recognition, and the corresponding evaluation item accuracy is 80% and 90%. If the overall accuracy is the average of the accuracy of each evaluation item, then the overall accuracy is 85%.
  • the overall accuracy has nothing to do with the accuracy of the judgment item.
  • the overall accuracy is used to describe how much data is labeled.
  • the reference data set is 100 images, and each of these 100 images includes puppy A. Input these 100 images into the first training model to get R1. There are 87 images of puppy A in R1. In the reference data set R0 corresponding to the reference data set, the puppy A in 100 images is labeled, so the overall accuracy of the first training model is 87%.
  • the data processing device may output the accuracy information of the first training model so that the user can learn the accuracy information of the first training model and collect the second data set in a targeted manner.
  • the accuracy information of the first training model that satisfies the deployment conditions may include:
  • the overall accuracy of the first training model is greater than the first threshold, and the percentage range of the first threshold is (0, 100]. For example, assuming that the first threshold is 85%, the overall accuracy of the first training model is greater than At 85%, the accuracy information of the first training model meets the deployment conditions.
  • the accuracy of the evaluation item of the first training model is greater than the second threshold, and the percentage of the second threshold is in the range (0, 100].
  • the evaluation item includes evaluation item 1 and evaluation item 2, assuming that the second threshold is 80%, when the accuracy of the first training model is greater than 80%, the accuracy information of the first training model meets the deployment conditions.
  • Each evaluation item can correspond to the same second threshold , It can also correspond to different second thresholds. For example, the second threshold corresponding to evaluation item 1 is 80%, and the second threshold corresponding to evaluation item 2 is 85%.
  • the overall accuracy of the first training model is greater than the first threshold, and the accuracy of the judgment item of the first training model is greater than the second threshold. For example, assuming that the first threshold is 85%, the evaluation items include evaluation item 1 and evaluation item 2, and the second threshold value corresponding to evaluation item 1 and evaluation item 2 is 80%, then the overall accuracy of the first training model is greater than 85%. And when the accuracy rates of the evaluation items corresponding to the evaluation item 1 and the evaluation item 2 of the first training model are both greater than 80%, the accuracy information of the first training model meets the deployment conditions.
  • the specific numerical values of the above-mentioned first threshold and the second threshold may be set by the user or set by default by the system, and the specific numerical values are not limited in the embodiment of the present application.
  • the data processing device outputs the first training model when the accuracy information of the first training model meets the deployment condition.
  • the first training model is transmitted to the application device, so that the application device can deploy the first training model to achieve the target project.
  • the first training model is deployed on the data processing device, so that the data processing device realizes the target project.
  • step 201 can implement data collection
  • step 202 can implement data labeling
  • step 203 can implement model generation
  • step 204 can implement model deployment, so that training models can be generated and deployed efficiently, conveniently and flexibly.
  • Steps 201 to 204 the series of processes of data collection, data labeling, model generation, and model deployment are managed in the form of projects, which can improve the convenience and feasibility of training model applications.
  • the data processing device can obtain the second data set for the target project according to the accuracy information of the first training model, that is, according to the accuracy of the first training model Information, conduct targeted data collection, and the obtained data set is the second data set.
  • the accuracy of the evaluation item 1 is that the accuracy of face recognition is 30%
  • the accuracy of the evaluation item 2 is that the accuracy of face recognition is 90%
  • the accuracy of the evaluation item 1 is not reached.
  • the second threshold 80%
  • face profile pictures can be collected in a targeted manner to obtain a second data set.
  • the data processing device In the case of obtaining the second data set, the data processing device generates a second labeled data set corresponding to the second data set according to the identification purpose of the target item, performs model training on the second labeled data set, and generates a second training model. Second, when the accuracy information of the training model meets the deployment conditions, the second training model is output.
  • the data processing device may call the first training model, and perform model training on the second labeled data set to generate the second training model. That is, the second labeled data set is input into the first training model for model training, and the second training model is obtained.
  • the data processing device invokes the second training model to test the reference data set, obtains the second test result, compares the second test result with the reference labeling result, and obtains accuracy information of the second training model.
  • the data processing device may obtain a third data set for the target item according to the accuracy information of the second training model, and generate a third annotation corresponding to the third data set
  • model training is performed on the third labeled data set to generate a third training model
  • the third training model is output. It is understandable that when the i-th training model does not meet the deployment conditions, the i+1-th training model is generated according to the i-th training model, and the reference labeling result is used to test whether the i+1-th training model meets the deployment conditions. , And repeat this until a training model that satisfies the deployment conditions is obtained.
  • the data processing device can compare the accuracy information of the multiple training models and output a better training model; or the user can compare the accuracy information of the multiple training models choose a better training model to deploy a better training model.
  • FIG. 3 is a schematic diagram of a scene of a data processing method provided by an embodiment of this application.
  • Figure 3 introduces from the perspective of the interaction between the user and the data processing device, which may include but is not limited to the following steps:
  • Step 301 The first user inputs a project creation instruction.
  • the first user is a user who operates the data processing device, and the second user is a system inspector or an administrator; or the authority level of the second user is higher than the authority level of the first user.
  • the first user can enter the project name, select the project type, upload the template picture, enter the height and width of the picture, and enter or select the label category (or called It is an annotation type), after the first user completes these operations, these operations can be confirmed.
  • the first user's confirmation operation of these operations can be understood as inputting an item creation instruction to the data processing device.
  • FIG. 4 the schematic diagram of the interface for creating a project shown in FIG. 4 is used as an example, and does not constitute a limitation to the embodiment of the present application.
  • Step 302 The data processing device creates a project.
  • the data processing device can create the project according to the information carried in the project creation instruction.
  • the information carried in the project creation instruction may include the information input or selected by the first user in the interface schematic shown in FIG. 4, for example, including the project name, project type, label category, and so on.
  • Step 303 The first user inputs a plan creation instruction.
  • the first user can input a plan creation instruction for the project. If the data processing device has not created a plan for the project, the plan creation instruction is used to create the first plan; if the data processing device has created a plan for the project, the plan creation instruction is used to create a new plan.
  • the plan creation instruction can carry the plan name, and the first user can name each plan independently. For example, in Figure 5, the plan name is "demo".
  • FIG. 5 Refer to the schematic interface diagram of a certain plan shown in FIG. 5, where the first user can click Edit Plan Category under Label Category to edit, modify or delete the label category of the plan. It should be noted that the schematic interface diagram shown in FIG. 5 is used as an example, and does not constitute a limitation to the embodiment of the present application.
  • Step 304 The data processing device creates a first plan.
  • the first user can execute the plan steps in sequence according to the plan steps in the interface diagram shown in FIG. 5.
  • Step 305 The first user inputs a data upload instruction.
  • the first user performs the step of uploading pictures, clicks on upload pictures, and selects pictures to be uploaded.
  • the pictures to be uploaded can be uploaded in the form of a compressed package.
  • the data processing device can receive the compressed package.
  • the pictures to be uploaded are pictures related to the purpose of the project.
  • Step 306 The data processing device obtains the first data set.
  • the data processing device decompresses the compressed package to obtain the first data set.
  • the set of pictures uploaded by the first user for the first plan is the first data set
  • the set of pictures uploaded for the second plan is the second data set.
  • the data processing device may sequentially output each picture in the first data set, and the first user executes the labeling step to manually label each output picture.
  • the data processing device can call the labeling model to label each picture.
  • manual + labeling model combination to label each picture.
  • Step 307 The data processing apparatus generates a first annotation data set corresponding to the first data set.
  • the data processing device may generate the first annotation data set.
  • Step 308a the first user inputs a confirmation instruction.
  • Step 308b the second user inputs a confirmation instruction.
  • the first user performs a check step to check whether the images marked in the first annotated data set meet the annotation category, whether the annotation position is correct, and so on. If the first user detects an error, he can modify the error and update the first annotation data set. If the first user checks that all the labels are correct, the first user inputs a confirmation instruction.
  • the second user may check the first annotation data set confirmed by the first user again after the first user has checked and confirmed, and input a confirmation instruction after the check is correct.
  • the data processing apparatus may perform step 309.
  • Step 309 The data processing device performs model training on the first labeled data set to generate a first training model.
  • Step 310 When the accuracy information of the first training model meets the deployment condition, the data processing device outputs the first training model.
  • step 307 For the specific implementation process of step 307, step 309, and step 310, please refer to the specific description of step 202 to step 204 in the embodiment shown in FIG. 2, which will not be repeated here.
  • the first user performs step 303 again to input a plan creation instruction, which is used to create a second plan.
  • the first user can upload the second data set based on the accuracy information of the first training model, the data processing device generates the second labeled data set, and after the first user and the second user confirm the second labeled data set, the data processing device will The second data set is input to the first training model for model training to generate a second training model, and when the accuracy information of the second training model meets the deployment conditions, the second training model is output.
  • the introduction is introduced from the perspective of the interaction between the user and the data processing device. Even if the first user is not familiar with the training model, the data processing device can be controlled to output the training model that meets the deployment conditions, so that the first user can Autonomously manage the training model to achieve the purpose of the project.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of this application.
  • the data processing device 60 includes an input unit 601, a processing unit 602, and an output unit 603.
  • the input unit 601 is used to obtain the first data set for the target item.
  • the processing unit 602 is configured to generate a first labeled data set corresponding to the first data set according to the identification purpose of the target item; perform model training on the first labeled data set to generate a first training model.
  • the output unit 603 is configured to output the first training model when the processing unit 602 determines that the accuracy information of the first training model satisfies the deployment condition.
  • the processing unit 602 is specifically configured to generate a first labeled data set corresponding to the first data set according to a labeling instruction for the first data in the first data set according to the identification purpose of the target item.
  • the processing unit 602 is specifically configured to call the annotation model to perform annotation processing on the first data in the first data set according to the identification purpose of the target item, and generate the first annotation data set corresponding to the first data set.
  • the processing unit 602 is further configured to count the number of marked data, the number of invalid data, and the number of marked objects in the first marked data set.
  • the processing unit 602 is specifically configured to call the target training model to perform model training on the first labeled data set, and generate the first training model.
  • the processing unit 602 is specifically configured to obtain a data processing server in an idle state, call the data processing server to perform model training on the first labeled data set, and generate a first training model.
  • the processing unit 602 is further configured to call the first training model to test the reference data set to obtain the test result; compare the test result with the reference annotation result corresponding to the reference data set to obtain the accuracy of the first training model information;
  • the output unit 603 is also used to output accuracy information of the first training model.
  • the accuracy information includes the overall accuracy and the accuracy of the evaluation item; the accuracy information that satisfies the deployment conditions includes: the overall accuracy is greater than the first threshold; or, the accuracy of the evaluation item is greater than the second threshold; or, the overall accuracy The degree is greater than the first threshold and the accuracy of the judgment item is greater than the second threshold.
  • the data processing device 60 further includes a storage unit for storing the first training model and the accuracy information of the first training model.
  • the input unit 601 is further configured to obtain information for the target project according to the accuracy information of the first training model when the processing unit 602 determines that the accuracy information of the first training model does not meet the deployment conditions.
  • the processing unit 602 is further configured to generate a second labeled data set corresponding to the second data set according to the identification purpose of the target item; perform model training on the second labeled data set to generate a second training model;
  • the output unit 603 is further configured to output the second training model when the processing unit 602 determines that the accuracy information of the second training model satisfies the deployment condition.
  • the processing unit 602 is specifically configured to call the first training model, perform model training on the second labeled data set, and generate the second training model.
  • FIG. 7 is a schematic structural diagram of a terminal device provided by an embodiment of this application.
  • the terminal device described in the embodiment of the present application includes: a processor 701, a communication interface 702, and a memory 703.
  • the processor 701, the communication interface 702, and the memory 703 may be connected through a bus or in other ways.
  • the embodiment of the present application takes the connection through a bus as an example.
  • the processor 701 may be a central processing unit (CPU), a network processor (Network Processor, NP), or a combination of a CPU and NP.
  • the processor 701 may also be a multi-core CPU or a core used to implement communication identification binding in a multi-core NP.
  • the processor 701 may be a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof.
  • the PLD may be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), a Generic Array Logic (GAL) or any combination thereof.
  • the communication interface 702 can be used for the interaction of sending and receiving information or signaling, as well as the reception and transmission of signals.
  • the memory 703 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system and a stored program required by at least one function (such as a text storage function, a location storage function, etc.); the storage data area may store Data (such as image data, text data) created according to the use of the device, etc., and may include application storage programs, etc.
  • the memory 703 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the terminal device shown in FIG. 7 also includes input and output devices, which are used to receive input instructions or selection instructions from the user, and are also used to output accuracy information on the user interface.
  • the memory 703 is also used to store program instructions.
  • the processor 701 is configured to execute program instructions stored in the memory 703, and when the program instructions are executed, the processor 701 is configured to:
  • the processor 701, the communication interface 702, and the memory 703 described in the embodiment of the present application can execute the implementation manner described in the data processing method provided in the embodiment of the present application, and details are not described herein again.
  • An embodiment of the present application also provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the method described in the foregoing method embodiment.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the above method embodiment.
  • the program can be stored in a computer-readable storage medium, and the storage medium can include: Flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

A data processing method and apparatus, and a computer readable storage medium. The method comprises: obtaining a first data set for a target item (201); generating, according to a recognition purpose of the target item, a first annotated data set corresponding to the first data set (202); performing model training on the first annotated data set to generate a first training model (203); and in the case that accuracy information of the first training model satisfies a deployment condition, outputting the first training model (204). The method can efficiently, conveniently, and flexibly generate and deploy a training model.

Description

数据处理方法及装置、计算机可读存储介质Data processing method and device, computer readable storage medium 技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种数据处理方法及装置、计算机可读存储介质。This application relates to the field of artificial intelligence technology, in particular to a data processing method and device, and a computer-readable storage medium.
背景技术Background technique
人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用***的一门新的技术科学。人工智能的研究领域包括机器人、语言识别、图像识别、自然语音处理和专家***等领域。人工智能的实现大多依赖于训练模型,训练模型是通过对大量训练数据进行自主学习,生成的模型。其中,数据的类型可以包括但不限于图像、视频、音频、物体、文本等。Artificial Intelligence (AI) is a new technological science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. The research fields of artificial intelligence include robotics, language recognition, image recognition, natural speech processing, and expert systems. The realization of artificial intelligence mostly relies on training models, which are models generated through independent learning on a large amount of training data. Among them, the type of data may include, but is not limited to, images, videos, audios, objects, texts, and so on.
如何高效、便捷地生成并部署训练模型是亟待解决的技术问题。How to efficiently and conveniently generate and deploy training models is a technical problem to be solved urgently.
发明内容Summary of the invention
本申请实施例公开了一种数据处理方法及装置、计算机可读存储介质,可以高效、便捷地生成并部署训练模型。The embodiments of the present application disclose a data processing method and device, and a computer-readable storage medium, which can efficiently and conveniently generate and deploy a training model.
第一方面,本申请实施例提供了一种数据处理方法,该方法包括:In the first aspect, an embodiment of the present application provides a data processing method, which includes:
获取针对目标项目的第一数据集;Obtain the first data set for the target project;
按照所述目标项目的识别目的生成所述第一数据集对应的第一标注数据集;Generating a first annotation data set corresponding to the first data set according to the identification purpose of the target item;
对所述第一标注数据集进行模型训练,生成第一训练模型;Performing model training on the first labeled data set to generate a first training model;
在所述第一训练模型的准确度信息满足部署条件的情况下,输出所述第一训练模型。When the accuracy information of the first training model satisfies the deployment condition, output the first training model.
第二方面,本申请实施例提供了一种数据处理装置,所述数据处理装置包括:输入单元、处理单元和输出单元;In a second aspect, an embodiment of the present application provides a data processing device, the data processing device includes: an input unit, a processing unit, and an output unit;
所述输入单元,用于获取针对目标项目的第一数据集;The input unit is used to obtain a first data set for the target item;
所述处理单元,用于按照所述目标项目的识别目的生成所述第一数据集对 应的第一标注数据集;对所述第一标注数据集进行模型训练,生成第一训练模型;The processing unit is configured to generate a first labeled data set corresponding to the first data set according to the identification purpose of the target item; perform model training on the first labeled data set to generate a first training model;
所述输出单元,用于在所述处理单元确定出所述第一训练模型的准确度信息满足部署条件的情况下,输出所述第一训练模型。The output unit is configured to output the first training model when the processing unit determines that the accuracy information of the first training model satisfies a deployment condition.
第三方面,本申请实施例提供了一种数据处理装置,包括处理器和存储器,所述存储器用于存储计算机指令,当所述处理器执行所述计算机指令时,以使所述数据处理装置执行上述第一方面所述的方法。In a third aspect, an embodiment of the present application provides a data processing device, including a processor and a memory, the memory is used to store computer instructions, and when the processor executes the computer instructions, the data processing device Perform the method described in the first aspect above.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可以实现上述第一方面所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program that can implement the method described in the first aspect when the computer program is executed by a processor.
本申请实施例中,通过获取针对目标项目的第一数据集,可以实现数据采集;通过生成第一标注数据集,可以实现数据标注;通过对第一标注数据集进行模型训练,可以实现模型生成;通过判断训练模型是否满足部署条件,可实现模型部署;从而可以高效、便捷、灵活地生成并部署训练模型。以项目的形式对数据采集、数据标注、模型生成和模型部署这一系列过程进行管理,可以提高训练模型应用的便捷性和可行性。In the embodiments of this application, data collection can be achieved by acquiring the first data set for the target project; data labeling can be realized by generating the first labeled data set; model generation can be realized by performing model training on the first labeled data set ; Model deployment can be realized by judging whether the training model meets the deployment conditions; thus, the training model can be generated and deployed efficiently, conveniently and flexibly. Managing the series of processes of data collection, data labeling, model generation and model deployment in the form of projects can improve the convenience and feasibility of training model applications.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative labor, other drawings can be obtained from these drawings.
图1是本申请实施例提供的一种网络架构的示意图;FIG. 1 is a schematic diagram of a network architecture provided by an embodiment of the present application;
图2是本申请实施例提供的一种数据处理方法的流程示意图;2 is a schematic flowchart of a data processing method provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据处理方法的场景示意图;FIG. 3 is a schematic diagram of a scene of a data processing method provided by an embodiment of the present application;
图4是本申请实施例提供的创建项目的界面示意图;Fig. 4 is a schematic diagram of an interface for creating a project provided by an embodiment of the present application;
图5是本申请实施例提供的某个计划的界面示意图;FIG. 5 is a schematic diagram of an interface of a certain plan provided by an embodiment of the present application;
图6是本申请实施例提供的一种数据处理装置的结构示意图;FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of the present application;
图7是本申请实施例公开的一种终端设备的结构示意图。Fig. 7 is a schematic structural diagram of a terminal device disclosed in an embodiment of the present application.
具体实施方式Detailed ways
在对本申请实施例提供的技术方案介绍之前,先对本申请实施例涉及的名称或术语进行介绍。Before introducing the technical solutions provided by the embodiments of the present application, the names or terms involved in the embodiments of the present application will be introduced first.
(1)训练模型(1) Training model
训练模型,指的是通过大量训练数据对所选的算法进行自适应调整(或称为自主学习),得到的模型。训练模型可以应用于机器学习、语言识别、图像识别等领域。例如,训练模型应用于图像识别领域,可以实现对图像中猫或狗的识别。The training model refers to the model obtained by adaptively adjusting the selected algorithm (or called autonomous learning) through a large amount of training data. The training model can be applied to machine learning, language recognition, image recognition and other fields. For example, the training model is applied to the field of image recognition, which can realize the recognition of cats or dogs in the image.
(2)数据集、标注数据集(2) Data set, labeled data set
数据集,指的是未标注的数据集,可以包括一个或多个未标注的数据。未标注的数据,即数据上不存在标注痕迹。例如,图像上不存在标注痕迹。A data set refers to an unlabeled data set, which can include one or more unlabeled data. Unlabeled data means that there is no trace of labeling on the data. For example, there is no mark on the image.
标注数据集,指的是标注的数据集,可以包括一个或多个标注的数据。标注的数据,即数据上存在标注痕迹。例如,图像上存在标注痕迹。Annotated data set refers to an annotated data set, which can include one or more annotated data. Labeled data, that is, there are traces of labeling on the data. For example, there are marking traces on the image.
在本申请实施例中,数据以图像为例进行介绍。In the embodiment of the present application, the data is introduced by taking an image as an example.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.
请参阅图1,为本申请实施例提供的一种网络架构的示意图。该网络架构景可包括用户、数据处理装置101和应用设备102。Please refer to FIG. 1, which is a schematic diagram of a network architecture provided by an embodiment of this application. The network architecture scene may include a user, a data processing device 101, and an application device 102.
其中,数据处理装置101可以是终端设备,也可以是与终端设备匹配使用的装置,例如处理器等。终端设备可以包括但不限于个人计算机(Personal Computer,PC)、笔记本电脑、平板电脑、智能手机(如Android手机等)、移动互联网设备(Mobile Internet Devices,MID)等。应用在本申请实施例中,数据处理装置101具有生成训练模型的能力。Among them, the data processing device 101 may be a terminal device, or a device matched with the terminal device, such as a processor. Terminal devices may include, but are not limited to, personal computers (PC), notebook computers, tablet computers, smart phones (such as Android phones, etc.), mobile Internet devices (Mobile Internet Devices, MID), and so on. Applied in the embodiment of the present application, the data processing device 101 has the ability to generate a training model.
其中,应用设备102可以包括但不限于机器人、飞行器、汽车、智能家电、可穿戴设备、虚拟现实(Virtual Reality,VR)设备、监控摄像设备、智能手机、平板电脑、MID、PC等设备。应用在本申请实施例中,应用设备102具有部署训练模型能力。The application device 102 may include, but is not limited to, robots, aircraft, automobiles, smart home appliances, wearable devices, virtual reality (VR) devices, surveillance camera devices, smart phones, tablet computers, MIDs, PCs and other devices. Application In the embodiment of this application, the application device 102 has the ability to deploy a training model.
图1所示的数据处理装置101以PC为例,应用设备102以智能机器人、汽车和飞行器为例,各个设备的形态和数量用于举例,并不构成对本申请实施例的 限定。The data processing device 101 shown in FIG. 1 takes a PC as an example, and the application equipment 102 takes a smart robot, a car, and an aircraft as examples. The shape and quantity of each device are for example, and do not constitute a limitation to the embodiment of the present application.
在本申请实施例中,数据处理装置101可根据用户输入的项目创建指令创建项目,项目可以是人工智能(Artificial Intelligence,AI)项目,AI项目可用于实现某种目的,例如用于实现对图像中的猫或狗等的识别。数据处理装置101在接收到针对项目的数据集的情况下,可按照该项目的识别目的生成该数据集对应的标注数据集,对该标注数据集进行模型训练,生成训练模型。In the embodiment of the present application, the data processing device 101 can create a project according to the project creation instruction input by the user. The project can be an artificial intelligence (AI) project. The AI project can be used to achieve a certain purpose, for example, to achieve image matching. Recognition of cats or dogs, etc. When the data processing device 101 receives a data set for an item, it may generate a labeled data set corresponding to the data set according to the identification purpose of the item, perform model training on the labeled data set, and generate a training model.
若该训练模型满足部署条件,则数据处理装置101可输出该训练模型,可将该训练模型输出至应用设备102,以便在应用设备102上部署该训练模型,从而应用设备102可以实现该识别目的。数据处理装置101也可自己部署该训练模型,即在数据处理装置101上部署该训练模型,从而数据处理装置101可以实现该识别目的。数据处理装置101还可将该训练模型输出至第三方平台,以便在第三方平台上部署该训练模型。第三方平台可以是图像识别平台或AI视觉识别平台等。If the training model meets the deployment conditions, the data processing device 101 can output the training model, and can output the training model to the application device 102 to deploy the training model on the application device 102, so that the application device 102 can achieve the identification purpose . The data processing device 101 may also deploy the training model by itself, that is, deploy the training model on the data processing device 101, so that the data processing device 101 can achieve the identification purpose. The data processing device 101 may also output the training model to a third-party platform, so as to deploy the training model on the third-party platform. The third-party platform can be an image recognition platform or an AI visual recognition platform, etc.
示例性的,假设用户需要训练监控摄像设备识别自家的小狗A,数据处理装置101在接收到创建AI项目1的项目创建指令时,创建AI项目1,AI项目1用于识别小狗A。数据处理装置101在接收到针对AI项目1的图像集(例如包括20张图像)的情况下,按照识别小狗A生成该图像集对应的标注图像集,标注图像集中附有小狗A的图像被标注(例如某个图像中的小狗A被虚线圈出)。数据处理装置101对该标注图像集进行模型训练,生成训练模型,若该训练模型满足部署条件,则数据处理装置101可将该训练模型输出至监控摄像设备,以使监控摄像设备可以实现对小狗A的识别。Exemplarily, suppose that the user needs to train the surveillance camera equipment to recognize the puppy A of his own. When the data processing device 101 receives the project creation instruction to create the AI item 1, it creates the AI item 1, and the AI item 1 is used to recognize the puppy A. When the data processing device 101 receives an image set for AI item 1 (for example, including 20 images), it generates an annotated image set corresponding to the image set according to the identified puppy A, and an image of puppy A is attached to the annotated image set. Is marked (for example, puppy A in a certain image is circled by a dotted line). The data processing device 101 performs model training on the labeled image set, and generates a training model. If the training model meets the deployment conditions, the data processing device 101 can output the training model to the surveillance camera equipment, so that the surveillance camera equipment can realize the Recognition of dog A.
本申请实施例可以应用于研发场景、测试场景和使用场景中,使得本申请实施例的适用范围广。即使对训练模型不了解的用户,也可以采用本申请实施例,控制应用设备102或数据处理装置101实现对指定项目的识别。The embodiments of the present application can be applied to research and development scenarios, test scenarios, and usage scenarios, so that the embodiments of the present application have a wide range of applications. Even users who do not know the training model can also use the embodiments of the present application to control the application device 102 or the data processing device 101 to realize the identification of the designated item.
下面对本申请实施例提供的数据处理方法进行详细介绍。The data processing method provided in the embodiments of the present application will be described in detail below.
请参阅图2,为本申请实施例提供的一种数据处理方法的流程示意图。如图2所示,数据处理方法包括但不限于如下步骤:Please refer to FIG. 2, which is a schematic flowchart of a data processing method provided by an embodiment of this application. As shown in Figure 2, the data processing method includes but is not limited to the following steps:
步骤201、数据处理装置获取目标项目的第一数据集。Step 201: The data processing device obtains the first data set of the target item.
其中,目标项目可以是任一AI项目,例如,用于识别人脸的AI项目,用 于识别特定生物的AI项目,或用户识别零件缺陷的AI项目等。不同的项目可以具有不同的识别目的。用户可针对数据处理装置的用户显示界面输入项目创建指令,数据处理装置可根据该项目创建指令创建项目。目标项目的第一数据集,可以理解为是用于实现目标项目的识别目的的数据集,其具体数量在本申请实施例中不作限定,例如为20张图像等。第一数据集可以是初始数据集,即最初获取的数据集,也可以是后续调整后的数据集。Among them, the target item can be any AI item, for example, an AI item for recognizing human faces, an AI item for recognizing specific creatures, or an AI item for users to recognize parts defects. Different items can have different identification purposes. The user can input a project creation instruction for the user display interface of the data processing device, and the data processing device can create a project according to the project creation instruction. The first data set of the target item can be understood as a data set used to achieve the identification purpose of the target item. The specific number is not limited in the embodiment of the present application, for example, 20 images. The first data set may be an initial data set, that is, a data set initially obtained, or a data set after subsequent adjustments.
在一种实现方式中,在创建目标项目之后,数据处理装置可接收用户针对目标项目输入的数据上传指令,该数据上传指令可携带第一数据集,用于将第一数据集上传至数据处理装置,以使数据处理装置获取到第一数据集。In an implementation manner, after the target project is created, the data processing device may receive a data upload instruction input by the user for the target project, and the data upload instruction may carry the first data set for uploading the first data set to the data processing Device, so that the data processing device obtains the first data set.
在另一种实现方式中,在创建目标项目之后,数据处理装置可通过数据采集装置获取针对目标项目的第一数据集。数据采集装置可以是终端设备的摄像头或监控摄像设备等。例如,摄像头在拍摄一系列小狗A的照片之后,将这些小狗A的照片传输至数据处理装置。In another implementation manner, after the target item is created, the data processing device may obtain the first data set for the target item through the data collection device. The data collection device can be a camera of a terminal device or a surveillance camera device. For example, after taking a series of photos of puppy A, the camera transmits these photos of puppy A to the data processing device.
可选的,一个项目可对应多个计划,一个计划可对应一个数据集,执行一个计划可得到一个训练模型。例如,数据集1为针对目标项目的计划1的一个数据集,执行计划1可得到训练模型1;数据集2为针对目标项目的计划2的一个数据集,执行计划2可得到训练模型2。Optionally, one project can correspond to multiple plans, one plan can correspond to one data set, and one training model can be obtained by executing one plan. For example, data set 1 is a data set of plan 1 for the target project, and execution plan 1 can get training model 1; data set 2 is a data set of plan 2 for the target project, and execution plan 2 can get training model 2.
步骤202、数据处理装置按照目标项目的识别目的生成所述第一数据集对应的第一标注数据集。Step 202: The data processing device generates a first annotation data set corresponding to the first data set according to the identification purpose of the target item.
其中,目标项目的识别目的可以包括项目类型和标注类型。项目类型可以包括但不限于目标检测、物体分割等,目标检测用于对目标进行检测,例如检测宠物或人脸等,物体分割用于对物体进行分割或分离,例如将图像中的人与景进行分割等。标注类型可以包括但不限于内倒角、异物、平面修等,标注类型可以由用户自定义添加或删除或修改,也可以由***提供。Among them, the identification purpose of the target item may include item type and label type. Item types can include, but are not limited to, target detection, object segmentation, etc. Target detection is used to detect targets, such as detecting pets or human faces, and object segmentation is used to segment or separate objects, such as combining people and scenes in an image. Perform segmentation, etc. Marking types can include, but are not limited to, internal chamfering, foreign matter, plane repair, etc. The marking types can be added or deleted or modified by the user, or they can be provided by the system.
第一标注数据集是指第一数据集中的第一数据被标注后的数据集。第一标注数据集的数量与第一数据集的数量可以相同,例如,第一数据集包括100张照片,第一标注数据集也包括100张照片,这100张照片中包括小狗A的照片被标注。The first labeled data set refers to a data set in which the first data in the first data set is labeled. The number of the first annotated data set can be the same as the number of the first data set. For example, the first data set includes 100 photos, and the first annotated data set also includes 100 photos. The 100 photos include photos of puppy A. Is marked.
在一种实施方式中,数据处理装置可通过调用标注模型生成第一标注数据 集。具体的,数据处理装置按照目标项目的识别目的,调用标注模型对第一数据集中的第一数据进行标注处理,生成第一标注数据集。其中,标注模型是已经训练好的,满足要求的,可以直接使用的一种训练模型,用于实现对数据的标注。In an embodiment, the data processing device may generate the first annotation data set by calling the annotation model. Specifically, according to the identification purpose of the target item, the data processing device calls the labeling model to perform labeling processing on the first data in the first data set to generate the first labeling data set. Among them, the labeling model is a training model that has been trained, meets the requirements, and can be used directly to implement labeling of data.
例如,目标项目的识别目的是检测小狗A,那么数据处理装置按照检测小狗A的目的,调用标注模型对20张图像(即第一数据集)中的每张图像进行标注,标注出图像中的小狗A,从而得到20张标注后的图像(即第一标注数据集)。For example, the recognition purpose of the target item is to detect puppy A, then the data processing device calls the labeling model to label each image in 20 images (that is, the first data set) according to the purpose of detecting puppy A, and the image is labeled Puppy A in, thus get 20 annotated images (that is, the first annotated data set).
通过调用标注模型的方式生成第一标注数据集,可以节省工作量,提高标注效率。Generating the first annotation data set by calling the annotation model can save workload and improve annotation efficiency.
在另一种实施方式中,数据处理装置可通过标注指令生成第一标注数据集。具体的,数据处理装置按照目标项目的识别目的,根据针对第一数据集中的第一数据的标注指令,生成第一标注数据集。即用户可针对第一数据集中的各个第一数据输入标注指令,该标注指令与识别目的相关,若识别目的为目标检测,该标注指令可以是圈出待识别的物体等;若识别目的为物体分割,该标注指令可以是划分物体的分割线等,在用户确认某个数据的标注指令后,数据处理装置可保存标注后的该数据,进而生成第一标注数据集。In another embodiment, the data processing device may generate the first annotation data set through an annotation instruction. Specifically, the data processing device generates the first annotation data set according to the identification purpose of the target item and according to the annotation instruction for the first data in the first data set. That is, the user can input a labeling instruction for each first data in the first data set. The labeling instruction is related to the recognition purpose. If the recognition purpose is target detection, the labeling instruction can be to circle the object to be recognized, etc.; if the recognition purpose is an object For segmentation, the labeling instruction may be a dividing line for dividing the object. After the user confirms the labeling instruction of a certain data, the data processing device may save the labelled data, and then generate the first labeling data set.
例如,目标项目的识别目的是检测小狗A,那么数据处理装置按照检测小狗A的目的,根据用户针对20张图像(即第一数据集)中每张图像输入的标注指令,保存标注后的20张图像(即第一标注数据集)。For example, if the purpose of identifying the target item is to detect puppy A, then the data processing device according to the purpose of detecting puppy A, according to the user's annotation instructions for each image in the 20 images (ie the first data set), save the annotations Of 20 images (that is, the first annotated data set).
通过标注指令的方式,相比通过调用标注模型的方式,第一标注数据集的准确性更高。The accuracy of the first annotation data set is higher by the way of labeling instructions than by calling the labeling model.
数据处理装置还可以通过调用标注模型与通过标注指令结合的方式,生成第一标注数据集。例如,数据处理装置根据目标项目的识别目的,先调用标注模型对第一数据集中的第一数据进行标注处理,再根据用户输入的标注指令进行调整,最终生成第一标注数据集。这种结合的方式,可以进一步提高第一标注数据集的准确性。The data processing device may also generate the first annotation data set by calling the annotation model and combining the annotation instructions. For example, according to the identification purpose of the target item, the data processing device first calls the labeling model to label the first data in the first data set, and then adjusts according to the labeling instructions input by the user, and finally generates the first labeling data set. This combination method can further improve the accuracy of the first annotation data set.
可选的,用户可对第一标注数据集进行检查,以确保第一标注数据集的准确性。用户可针对第一标注数据集输入至少一次确认指令,若数据处理装置接 收到至少一次确认指令,则可以确定第一标注数据集满足训练条件,可以执行步骤203。其中,至少一次确认指令中各次确认指令可来自不同权限的用户,例如两次确认指令,一次来自操作终端设备的用户,一次来自***检查员或管理员;或第一次来自具有第一权限的用户,第二次来自具有第二权限的用户,第二权限的级别高于第一权限。Optionally, the user can check the first labeled data set to ensure the accuracy of the first labeled data set. The user can input at least one confirmation instruction for the first annotated data set. If the data processing device receives at least one confirmation instruction, it can be determined that the first annotated data set meets the training condition, and step 203 can be executed. Among them, each of the at least one confirmation instruction can be from a user with different authority, for example, two confirmation instructions, one from the user operating the terminal device, one from the system inspector or administrator; or the first one from the user with the first authority The second time comes from a user with the second authority, the level of the second authority is higher than the first authority.
可选的,数据处理装置可统计第一标注数据集中已标注的数据数量、无效数据的数量和标注的对象数量等。其中,已标注的数据数量指的是有多少数据被标注,例如100张图像中有多少张图像被标注。无效数据的数量指的是与识别目的不相关的数据数量,例如识别目的是检测小狗A,但是某个图像为一栋建筑物的图像,该图像与小狗A无任何关系,那么该图像可以认为是无效数据。标注的对象数量指的是数据中有几个对象被标注,例如识别目的为物体分割,那么标注的对象数量至少为两个。数据处理装置还可以输出这些统计结果,以供用户进行相应的调整。Optionally, the data processing device may count the number of marked data, the number of invalid data, and the number of marked objects in the first marked data set. Among them, the number of labeled data refers to how many data are labeled, for example, how many images out of 100 images are labeled. The amount of invalid data refers to the amount of data that is not related to the purpose of recognition. For example, the purpose of recognition is to detect puppy A, but an image is an image of a building, and the image has nothing to do with puppy A. Then the image It can be considered invalid data. The number of labeled objects refers to how many objects are labeled in the data. For example, if the recognition purpose is object segmentation, then the number of labeled objects is at least two. The data processing device can also output these statistical results for the user to make corresponding adjustments.
步骤203、数据处理装置对第一标注数据集进行模型训练,生成第一训练模型。Step 203: The data processing device performs model training on the first labeled data set to generate a first training model.
在一种实施方式中,数据处理装置中不存在针对目标项目的训练模型,即数据处理装置未执行生成目标项目的训练模型的过程,那么数据处理装置调用目标训练模型对第一标注数据集进行模型训练,生成第一训练模型。其中,目标训练模型可以理解为初始的,未输入任何训练数据的模型,用于对输入的训练数据进行训练。目标训练模型也可以描述为人工智能模型或确定模型等。目标训练模型可以是线性模型、卷积神经网络模型或循环神经网络模型等。数据处理装置将第一标注数据集作为训练数据输入目标训练模型,以进行模型训练,从而生成第一训练模型。例如,目标训练模型为卷积神经网络模型,数据处理装置将20张标注小狗A的图像输入卷积神经网络模型,该卷积神经网络模型通过模型训练,可得到第一训练模型,若第一训练模型满足部署条件,那么部署第一训练模型可实现对小狗A的识别。In one embodiment, there is no training model for the target item in the data processing device, that is, the data processing device does not execute the process of generating the training model of the target item, then the data processing device calls the target training model to perform the training on the first labeled data set. Model training to generate the first training model. Among them, the target training model can be understood as an initial model without any training data input, which is used to train the input training data. The target training model can also be described as an artificial intelligence model or a deterministic model. The target training model can be a linear model, a convolutional neural network model, or a recurrent neural network model. The data processing device inputs the first labeled data set as training data into the target training model to perform model training, thereby generating the first training model. For example, the target training model is a convolutional neural network model, and the data processing device inputs 20 images labeled puppy A into the convolutional neural network model. The convolutional neural network model can be trained through the model to obtain the first training model. Once the training model meets the deployment conditions, the deployment of the first training model can realize the recognition of puppy A.
在另一种实施方式中,数据处理装置中存在针对目标项目的训练模型,即数据处理装置已执行生成目标项目的训练模型的过程,得到一个或多个已有训练模型,但是这些训练模型可能并不满足部署条件,那么数据处理装置可基于 已有训练模型进行模型训练,生成第一训练模型。假设已有训练模型为训练模型BB,数据处理装置可基于训练模型BB的准确度信息,获取第一数据集,生成第一标注数据集,将第一标注数据集输入训练模型BB,生成第一训练模型。由于第一数据集是基于训练模型BB进行针对性获取的,那么第一训练模型的准确度高于训练模型BB。In another embodiment, there is a training model for the target item in the data processing device, that is, the data processing device has performed the process of generating the training model of the target item to obtain one or more existing training models, but these training models may If the deployment conditions are not met, the data processing device may perform model training based on the existing training model to generate the first training model. Assuming that the existing training model is the training model BB, the data processing device can obtain the first data set based on the accuracy information of the training model BB, generate the first labeled data set, and input the first labeled data set into the training model BB to generate the first data set. Train the model. Since the first data set is acquired based on the training model BB, the accuracy of the first training model is higher than that of the training model BB.
上述两种生成第一训练模型方式中,数据处理装置可获取处于空闲态的数据处理服务器,调用该数据处理服务器执行模型训练,以生成第一训练模型。其中,数据处理服务器可以包括但不限于图像处理(Graphics Processing Unit,GPU)服务器、文本处理服务器、中央处理(Central Processing Unit,CPU)服务器、专用集成电路(Application-specific Integrated Circuit,ASIC)服务器、张量处理(Tensor Processing Unit,TPU)服务器、神经网络处理(Neutral Processing Unit,NPU)服务器、现场可编程门阵列(Field-programmable Gate Array,FPGA)服务器等。若数据的类型为图像,那么数据处理服务器可以是GPU服务器,本申请实施例以GPU服务器为例进行描述。GPU服务器是基于图形应用的计算服务,具有实时高速的并行计算和浮点计算能力,适应用于3D图形应用程序、视频解码、深度学习、科学计算等应用场景,GPU服务器可以位于互联网云端,也可以是以GPU处理器的形式搭载在数据处理装置中的。In the above two methods of generating the first training model, the data processing device may obtain a data processing server in an idle state, and call the data processing server to perform model training to generate the first training model. Among them, the data processing server may include, but is not limited to, a graphics processing (Graphics Processing Unit, GPU) server, a text processing server, a central processing (CPU) server, an application-specific integrated circuit (ASIC) server, Tensor Processing Unit (TPU) server, Neural Network Processing Unit (NPU) server, Field-programmable Gate Array (FPGA) server, etc. If the type of data is an image, the data processing server may be a GPU server, and the embodiment of the present application takes the GPU server as an example for description. The GPU server is a computing service based on graphics applications. It has real-time and high-speed parallel computing and floating-point computing capabilities. It is suitable for application scenarios such as 3D graphics applications, video decoding, deep learning, and scientific computing. The GPU server can be located in the Internet cloud or It may be mounted in a data processing device in the form of a GPU processor.
步骤204、数据处理装置在第一训练模型的准确度信息满足部署条件的情况下,输出第一训练模型。Step 204: The data processing device outputs the first training model when the accuracy information of the first training model meets the deployment condition.
数据处理装置在得到第一训练模型的情况下,可通过参考数据集来判断第一训练模型的准确度信息是否满足部署条件。具体的,数据处理装置调用第一训练模型对参考数据集进行测试,得到第一测试结果,将第一测试结果与参考数据集对应的参考标注结果进行对比,得到第一训练模型的准确度信息。数据处理装置调用第一训练模型对参考数据集进行测试,即将参考数据集输入第一训练模型以进行目标检测或物体分割等。When the data processing device obtains the first training model, it can judge whether the accuracy information of the first training model meets the deployment condition by referring to the data set. Specifically, the data processing device calls the first training model to test the reference data set, obtains the first test result, and compares the first test result with the reference annotation result corresponding to the reference data set to obtain the accuracy information of the first training model . The data processing device calls the first training model to test the reference data set, that is, the reference data set is input to the first training model for target detection or object segmentation.
其中,参考数据集也可以描述为测试数据集等,参考数据集为用户精心挑选的符合目标项目的识别目的数据集,为未标注的数据集。第一测试结果(假设为R1)是指将参考数据集输入第一训练模型之后,第一训练模型输出的标注结果,第一测试结果所包括的数据数量可以与参考数据集所包括的数据数量 相同。参考标注结果(假设为R0),也可以描述为测试标注数据集等,参考标注结果为用户手动对参考数据集进行标注的,符合目标项目的识别目的的标注数据集。参考标注结果经过多次确认,其标注的准确性较高,用于检测生成的训练模型是否满足部署条件。可以理解的是,参考数据集是试卷,参考标注结果是该试卷的参考标注答案。Among them, the reference data set can also be described as a test data set, etc. The reference data set is a data set carefully selected by the user to meet the identification purpose of the target item, and is an unlabeled data set. The first test result (assumed to be R1) refers to the labeling result output by the first training model after the reference data set is input to the first training model. The number of data included in the first test result can be the same as the number of data included in the reference data set the same. The reference labeling result (assuming R0) can also be described as a test labeling data set, etc. The reference labeling result is a labeling data set that the user manually annotates the reference data set and meets the identification purpose of the target item. The result of reference labeling has been confirmed many times, and its labeling accuracy is high, and it is used to detect whether the generated training model meets the deployment conditions. It is understandable that the reference data set is the test paper, and the reference marked result is the reference marked answer of the test paper.
数据处理装置将第一测试结果(R1)与参考标注结果(R0)进行一一对比,可得到第一训练模型的准确度信息。准确度信息可以包括整体准确度和评判项准确度。整体准确度是指第一训练模型从整体角度上评判的准确度。评判项准确度是指第一训练模型从特定指标角度上评判的准确度。特定指标可以是一种或多种,一种特定指标对应一个评判项准确度。特定指标可以包括但不限于:识别类型,误识别,漏识别,错识别,多识别等。The data processing device compares the first test result (R1) with the reference labeling result (R0) one by one to obtain accuracy information of the first training model. The accuracy information may include the overall accuracy and the accuracy of the judgment item. The overall accuracy refers to the accuracy of the judgment of the first training model from the overall perspective. The accuracy of the evaluation item refers to the accuracy of the first training model from the perspective of a specific index. The specific index can be one or more, and a specific index corresponds to the accuracy of a judgment item. Specific indicators may include, but are not limited to: recognition type, misrecognition, missed recognition, wrong recognition, multiple recognition, etc.
其中,识别类型对应的评判项准确度表示识别类型的识别准确度,例如识别目的是识别猫和狗,那么包括两种识别类型,识别猫对应的评判项准确度和识别狗对应的评判项准确度。误识别表示与R0相比,R1中存在不应该被标注的被标注的情况,例如识别目的是识别小猫B,R0中对小猫B进行了标注,但是R1中对小猫C的图像当成了小猫B进行标注,则认为R1中该图像存在误识别。误识别对应的评判项准确度越高,表示误识别的概率越小。漏识别表示与R0相比,R1中存在应该被标注的未被标注的情况,例如识别目的是识别小猫B,R0中对小猫B进行了标注,但是R1中出现某张图像包括小猫B,该图像中的小猫B未被标注,则认为R1中该图像存在漏识别。漏识别对应的评判项准确度越高,表示漏识别的概率越小。错识别表示与R0相比,R1中存在识别类型错误的情况,例如识别目的是识别小猫B,R0中对小猫B进行了标注,但是R1中出现把小猫B的图像标注成小狗A,则认为R1中该图像存在错识别。错识别对应的评判项准确度越高,表示错识别的概率越小。多识别表示对于某张图像,与R0相比,R1中同一类型的识别结果多于R0中该类型的识别结果,例如,对于某张图像,识别目的是识别小狗A,R0中对该图像共标注出1只小狗A,但是R1中标注出2只小狗A,则认为R1中存在多识别。多识别对应的评判项准确度越高,表示多识别的概率越小。Among them, the accuracy of the judgment item corresponding to the recognition type represents the recognition accuracy of the recognition type. For example, the purpose of the recognition is to identify cats and dogs, then there are two types of recognition, the accuracy of the judgment item corresponding to the recognition of the cat and the accuracy of the judgment item corresponding to the recognition of the dog degree. Misrecognition means that compared with R0, there are cases in R1 that should not be marked. For example, the purpose of recognition is to identify kitten B. In R0, kitten B is marked, but the image of kitten C in R1 is regarded as If the kitten B is labeled, it is considered that the image in R1 is misrecognized. The higher the accuracy of the judgment item corresponding to the misrecognition, the lower the probability of misrecognition. Missing recognition means that compared with R0, there is an unlabeled situation in R1 that should be labeled. For example, the purpose of recognition is to identify kitten B. Kitten B is labeled in R0, but there is a certain image in R1 that includes a kitten. B. If the kitten B in the image is not labeled, it is considered that the image in R1 has missed recognition. The higher the accuracy of the evaluation item corresponding to the missed recognition, the lower the probability of missed recognition. Wrong recognition means that compared with R0, there is a wrong recognition type in R1. For example, the purpose of recognition is to recognize kitten B. Kitten B is labeled in R0, but the image of kitten B is labeled as puppy in R1. A, it is considered that the image in R1 is misidentified. The higher the accuracy of the judgment item corresponding to the wrong recognition, the lower the probability of the wrong recognition. Multiple recognition means that for a certain image, compared with R0, the recognition results of the same type in R1 are more than the recognition results of that type in R0. For example, for a certain image, the purpose of recognition is to recognize puppy A, and the image in R0 A total of 1 puppy A is marked, but 2 puppy A is marked in R1, it is considered that there is multiple recognition in R1. The higher the accuracy of the evaluation item corresponding to multiple recognition, the lower the probability of multiple recognition.
在一种实施方式中,整体准确度的大小与评判项准确度有关。例如,整体 准确度是各项评判项准确度的加权平均,或是各项评判项准确度的平均值等。示例性的,特定指标包括误识别和漏识别,对应的评判项准确度80%和90%,若整体准确度是各项评判项准确度的平均值,那么整体准确度是85%。In one embodiment, the overall accuracy is related to the accuracy of the judgment item. For example, the overall accuracy is the weighted average of the accuracy of each evaluation item, or the average of the accuracy of each evaluation item. Exemplarily, the specific indicators include misrecognition and missed recognition, and the corresponding evaluation item accuracy is 80% and 90%. If the overall accuracy is the average of the accuracy of each evaluation item, then the overall accuracy is 85%.
在另一种实施方式中,整体准确度的大小与评判项准确度无关。整体准确度用于描述多少数据被标注。例如,参考数据集为100张图像,这100张图像中每张图像均包括小狗A,将这100张图像输入第一训练模型,得到R1,R1中有87张图像中的小狗A被标注,而参考数据集对应的参考标注数据集R0中有100张图像中的小狗A被标注,那么第一训练模型的整体准确度为87%。In another embodiment, the overall accuracy has nothing to do with the accuracy of the judgment item. The overall accuracy is used to describe how much data is labeled. For example, the reference data set is 100 images, and each of these 100 images includes puppy A. Input these 100 images into the first training model to get R1. There are 87 images of puppy A in R1. In the reference data set R0 corresponding to the reference data set, the puppy A in 100 images is labeled, so the overall accuracy of the first training model is 87%.
上述两种实施方式用于举例,并不构成对本申请实施例的限定,还可以采用其他方式来衡量整体准确度和评判项准确度。通过训练模型的准确度信息,可以方便用户进行综合分析,进行针对性的优化调整,更加客观的评判训练模型的识别效果。The above two implementation manners are used as examples, and do not constitute a limitation to the embodiments of the present application, and other methods may also be used to measure the overall accuracy and the accuracy of the evaluation item. Through the accuracy information of the training model, it is convenient for users to conduct comprehensive analysis, make targeted optimization adjustments, and judge the recognition effect of the training model more objectively.
在得到第一训练模型的准确度信息的情况下,数据处理装置可输出第一训练模型的准确度信息,以便用户获知第一训练模型的准确度信息,进行针对性地采集第二数据集。When the accuracy information of the first training model is obtained, the data processing device may output the accuracy information of the first training model so that the user can learn the accuracy information of the first training model and collect the second data set in a targeted manner.
第一训练模型的准确度信息满足部署条件可以包括:The accuracy information of the first training model that satisfies the deployment conditions may include:
1)第一训练模型的整体准确度大于第一阈值,第一阈值的百分比取值区间为(0,100]。例如,假设第一阈值为85%,则第一训练模型的整体准确度大于85%时,第一训练模型的准确度信息满足部署条件。1) The overall accuracy of the first training model is greater than the first threshold, and the percentage range of the first threshold is (0, 100]. For example, assuming that the first threshold is 85%, the overall accuracy of the first training model is greater than At 85%, the accuracy information of the first training model meets the deployment conditions.
2)第一训练模型的评判项准确度大于第二阈值,第二阈值的百分比取值区间为(0,100]。例如,评判项包含了评判项1和评判项2,假设第二阈值为80%,则第一训练模型在评判项1和评判项2对应的评判项准确率均大于80%时,第一训练模型的准确度信息满足部署条件。各个评判项可以对应同一个第二阈值,也可以对应不同的第二阈值,例如评判项1对应的第二阈值为80%,评判项2对应的第二阈值为85%。2) The accuracy of the evaluation item of the first training model is greater than the second threshold, and the percentage of the second threshold is in the range (0, 100]. For example, the evaluation item includes evaluation item 1 and evaluation item 2, assuming that the second threshold is 80%, when the accuracy of the first training model is greater than 80%, the accuracy information of the first training model meets the deployment conditions. Each evaluation item can correspond to the same second threshold , It can also correspond to different second thresholds. For example, the second threshold corresponding to evaluation item 1 is 80%, and the second threshold corresponding to evaluation item 2 is 85%.
3)第一训练模型的整体准确度大于第一阈值,且第一训练模型的评判项准确度大于第二阈值。例如,假设第一阈值为85%,评判项包括评判项1和评判项2,评判项1和评判项2对应的第二阈值为80%,则第一训练模型的整体准确度大于85%,且第一训练模型在评判项1和评判项2对应的评判项准确率 均大于80%时,第一训练模型的准确度信息满足部署条件。3) The overall accuracy of the first training model is greater than the first threshold, and the accuracy of the judgment item of the first training model is greater than the second threshold. For example, assuming that the first threshold is 85%, the evaluation items include evaluation item 1 and evaluation item 2, and the second threshold value corresponding to evaluation item 1 and evaluation item 2 is 80%, then the overall accuracy of the first training model is greater than 85%. And when the accuracy rates of the evaluation items corresponding to the evaluation item 1 and the evaluation item 2 of the first training model are both greater than 80%, the accuracy information of the first training model meets the deployment conditions.
上述第一阈值和第二阈值的具体数值可以由用户设置,或由***默认设置,具体数值在本申请实施例不作限定。The specific numerical values of the above-mentioned first threshold and the second threshold may be set by the user or set by default by the system, and the specific numerical values are not limited in the embodiment of the present application.
数据处理装置在第一训练模型的准确度信息满足部署条件的情况下,输出第一训练模型。例如,将第一训练模型传输至应用设备中,以便应用设备可以部署第一训练模型,实现目标项目。再例如,将第一训练模型部署数据处理装置上,以便数据处理装置实现目标项目。The data processing device outputs the first training model when the accuracy information of the first training model meets the deployment condition. For example, the first training model is transmitted to the application device, so that the application device can deploy the first training model to achieve the target project. For another example, the first training model is deployed on the data processing device, so that the data processing device realizes the target project.
在图2所描述的方法中,步骤201可实现数据采集,步骤202可实现数据标注,步骤203可实现模型生成,步骤204可实现模型部署,从而可以高效、便捷、灵活地生成并部署训练模型。步骤201-步骤204,以项目的形式对数据采集、数据标注、模型生成和模型部署这一系列过程进行管理,可以提高训练模型应用的便捷性和可行性。In the method described in Figure 2, step 201 can implement data collection, step 202 can implement data labeling, step 203 can implement model generation, and step 204 can implement model deployment, so that training models can be generated and deployed efficiently, conveniently and flexibly. . Steps 201 to 204, the series of processes of data collection, data labeling, model generation, and model deployment are managed in the form of projects, which can improve the convenience and feasibility of training model applications.
在第一训练模型的整体准确度小于第一阈值,或任一个评判项准确度小于第二阈值的情况下,可确定出第一训练模型的准确度信息不满足部署条件。在第一训练模型的准确度信息不满足部署条件的情况下,数据处理装置可根据第一训练模型的准确度信息,获取针对目标项目的第二数据集,即根据第一训练模型的准确度信息,进行针对性的数据采集,得到的数据集为第二数据集。In the case that the overall accuracy of the first training model is less than the first threshold, or the accuracy of any one of the evaluation items is less than the second threshold, it can be determined that the accuracy information of the first training model does not meet the deployment conditions. In the case that the accuracy information of the first training model does not meet the deployment conditions, the data processing device can obtain the second data set for the target project according to the accuracy information of the first training model, that is, according to the accuracy of the first training model Information, conduct targeted data collection, and the obtained data set is the second data set.
例如,第一训练模型的准确度信息中,评判项准确度1为人脸侧脸识别准度为30%,评判项准确度2为人脸正面识别准度为90%,评判项准确度1未达到第二阈值(80%),第一训练模型的准确度信息不满足部署条件,则根据第一训练模型的准确度信息可以针对性采集人脸侧脸图片,得到第二数据集。For example, in the accuracy information of the first training model, the accuracy of the evaluation item 1 is that the accuracy of face recognition is 30%, the accuracy of the evaluation item 2 is that the accuracy of face recognition is 90%, and the accuracy of the evaluation item 1 is not reached. The second threshold (80%), if the accuracy information of the first training model does not meet the deployment conditions, then according to the accuracy information of the first training model, face profile pictures can be collected in a targeted manner to obtain a second data set.
在得到第二数据集的情况下,数据处理装置按照目标项目的识别目的生成第二数据集对应的第二标注数据集,对第二标注数据集进行模型训练,生成第二训练模型,在第二训练模型的准确度信息满足部署条件的情况下,输出第二训练模型。In the case of obtaining the second data set, the data processing device generates a second labeled data set corresponding to the second data set according to the identification purpose of the target item, performs model training on the second labeled data set, and generates a second training model. Second, when the accuracy information of the training model meets the deployment conditions, the second training model is output.
数据处理装置生成第二标注数据集的过程,可参考步骤202中生成第一标注数据集的过程。数据处理装置在生成第二训练模型时,可调用第一训练模型,第二标注数据集进行模型训练,以生成第二训练模型。即将第二标注数据集输入第一训练模型进行模型训练,得到第二训练模型。数据处理装置调用第二训 练模型对参考数据集进行测试,得到第二测试结果,将第二测试结果与参考标注结果进行对比,得到第二训练模型的准确度信息。For the process of generating the second annotation data set by the data processing apparatus, reference may be made to the process of generating the first annotation data set in step 202. When generating the second training model, the data processing device may call the first training model, and perform model training on the second labeled data set to generate the second training model. That is, the second labeled data set is input into the first training model for model training, and the second training model is obtained. The data processing device invokes the second training model to test the reference data set, obtains the second test result, compares the second test result with the reference labeling result, and obtains accuracy information of the second training model.
若第二训练模型的准确度信息依然不满足部署条件,则数据处理装置可根据第二训练模型的准确度信息,获取针对目标项目的第三数据集,生成第三数据集对应的第三标注数据集,对第三标注数据集进行模型训练,生成第三训练模型,在第三训练模型的准确度信息满足部署条件的情况下,输出第三训练模型。可以理解的是,在第i个训练模型不满足部署条件的情况下,根据第i个训练模型生成第i+1个训练模型,用参考标注结果检验第i+1个训练模型是否满足部署条件,如此重复,直到得到满足部署条件的训练模型。If the accuracy information of the second training model still does not meet the deployment conditions, the data processing device may obtain a third data set for the target item according to the accuracy information of the second training model, and generate a third annotation corresponding to the third data set In the data set, model training is performed on the third labeled data set to generate a third training model, and when the accuracy information of the third training model meets the deployment conditions, the third training model is output. It is understandable that when the i-th training model does not meet the deployment conditions, the i+1-th training model is generated according to the i-th training model, and the reference labeling result is used to test whether the i+1-th training model meets the deployment conditions. , And repeat this until a training model that satisfies the deployment conditions is obtained.
若针对目标项目得到多个满足部署条件的训练模型,则数据处理装置可对这多个训练模型的准确度信息进行比较,输出较佳的训练模型;或用户根据多个训练模型的准确度信息选择较佳的训练模型,以部署较佳的训练模型。If multiple training models that meet the deployment conditions are obtained for the target project, the data processing device can compare the accuracy information of the multiple training models and output a better training model; or the user can compare the accuracy information of the multiple training models Choose a better training model to deploy a better training model.
请参阅图3,为本申请实施例提供的一种数据处理方法的场景示意图。图3从用户与数据处理装置交互的角度进行介绍,可以包括但不限于如下步骤:Please refer to FIG. 3, which is a schematic diagram of a scene of a data processing method provided by an embodiment of this application. Figure 3 introduces from the perspective of the interaction between the user and the data processing device, which may include but is not limited to the following steps:
步骤301,第一用户输入项目创建指令。Step 301: The first user inputs a project creation instruction.
图3中,第一用户为操作数据处理装置的用户,第二用户为***检查员或管理员;或第二用户的权限级别高于第一用户的权限级别。In FIG. 3, the first user is a user who operates the data processing device, and the second user is a system inspector or an administrator; or the authority level of the second user is higher than the authority level of the first user.
可参见图4所示的创建项目的界面示意图,第一用户可在该界面示意图中输入项目名称,选择项目类型,上传模板图片,输入图片的高和宽,以及输入或选择标注类别(或称为标注类型),待第一用户完成这些操作之后,可确认这些操作。第一用户对这些操作的确认操作,可以理解为向数据处理装置输入项目创建指令。Refer to the schematic diagram of the project creation interface shown in Figure 4. The first user can enter the project name, select the project type, upload the template picture, enter the height and width of the picture, and enter or select the label category (or called It is an annotation type), after the first user completes these operations, these operations can be confirmed. The first user's confirmation operation of these operations can be understood as inputting an item creation instruction to the data processing device.
需要说明的是,图4所示的创建项目的界面示意图用于举例,并不构成对本申请实施例的限定。It should be noted that the schematic diagram of the interface for creating a project shown in FIG. 4 is used as an example, and does not constitute a limitation to the embodiment of the present application.
步骤302,数据处理装置创建项目。Step 302: The data processing device creates a project.
数据处理装置在接收到该项目创建指令的情况下,可根据该项目创建指令所携带的信息创建项目。该项目创建指令所携带的信息可包括第一用户在图4所示的界面示意图中输入或选择的信息,例如包括项目名称、项目类型、标注类别等。Upon receiving the project creation instruction, the data processing device can create the project according to the information carried in the project creation instruction. The information carried in the project creation instruction may include the information input or selected by the first user in the interface schematic shown in FIG. 4, for example, including the project name, project type, label category, and so on.
步骤303,第一用户输入计划创建指令。Step 303: The first user inputs a plan creation instruction.
在数据处理装置创建项目之后,第一用户可针对该项目输入计划创建指令。若数据处理装置未创建针对该项目的计划,则该计划创建指令用于创建第一计划;若数据处理装置创建过针对该项目的计划,则该计划创建指令用于创建新的计划。计划创建指令可携带计划名称,第一用户可自主命名各个计划的名称,例如图5中,计划名称为“demo”。After the data processing device creates the project, the first user can input a plan creation instruction for the project. If the data processing device has not created a plan for the project, the plan creation instruction is used to create the first plan; if the data processing device has created a plan for the project, the plan creation instruction is used to create a new plan. The plan creation instruction can carry the plan name, and the first user can name each plan independently. For example, in Figure 5, the plan name is "demo".
可参见图5所示的某个计划的界面示意图,第一用户可在该界面示意图中点击标注类别下的编辑计划类别,编辑、修改或删除该计划的标注类别。需要说明的是,图5所示的界面示意图用于举例,并不构成对本申请实施例的限定。Refer to the schematic interface diagram of a certain plan shown in FIG. 5, where the first user can click Edit Plan Category under Label Category to edit, modify or delete the label category of the plan. It should be noted that the schematic interface diagram shown in FIG. 5 is used as an example, and does not constitute a limitation to the embodiment of the present application.
步骤304,数据处理装置创建第一计划。Step 304: The data processing device creates a first plan.
数据处理装置在根据计划创建指令创建第一计划之后,第一用户可根据图5所示界面示意图中的计划步骤,依次执行。After the data processing device creates the first plan according to the plan creation instruction, the first user can execute the plan steps in sequence according to the plan steps in the interface diagram shown in FIG. 5.
步骤305,第一用户输入数据上传指令。Step 305: The first user inputs a data upload instruction.
第一用户执行上传图片的步骤,点击上传图片,选择待上传图片,待上传图片可以压缩包的形式进行上传,待第一用户点击确认上传指令后,数据处理装置可接收到该压缩包。其中,待上传图片是与项目的目的相关的图片。The first user performs the step of uploading pictures, clicks on upload pictures, and selects pictures to be uploaded. The pictures to be uploaded can be uploaded in the form of a compressed package. After the first user clicks to confirm the upload instruction, the data processing device can receive the compressed package. Among them, the pictures to be uploaded are pictures related to the purpose of the project.
步骤306,数据处理装置获取第一数据集。Step 306: The data processing device obtains the first data set.
数据处理装置对该压缩包进行解压缩,可获取第一数据集。在图3所示的实施例中,假设第一用户针对第一计划上传的图片集合为第一数据集,针对第二计划上传的图片集合为第二数据集。The data processing device decompresses the compressed package to obtain the first data set. In the embodiment shown in FIG. 3, it is assumed that the set of pictures uploaded by the first user for the first plan is the first data set, and the set of pictures uploaded for the second plan is the second data set.
数据处理装置在获取到第一数据集的情况下,可依次输出第一数据集中的各个图片,第一用户执行标注步骤,手动对输出的各个图片进行标注。或数据处理装置可调用标注模型对各个图片进行标注。或手动+标注模型结合,对各个图片进行标注。After acquiring the first data set, the data processing device may sequentially output each picture in the first data set, and the first user executes the labeling step to manually label each output picture. Or the data processing device can call the labeling model to label each picture. Or manual + labeling model combination to label each picture.
步骤307,数据处理装置生成第一数据集对应的第一标注数据集。Step 307: The data processing apparatus generates a first annotation data set corresponding to the first data set.
在完成标注的情况下,数据处理装置可生成第一标注数据集。After the annotation is completed, the data processing device may generate the first annotation data set.
步骤308a,第一用户输入确认指令。Step 308a, the first user inputs a confirmation instruction.
步骤308b,第二用户输入确认指令。Step 308b, the second user inputs a confirmation instruction.
在生成第一标注数据集的情况下,第一用户执行检查步骤,检查第一标注 数据集中所标注的图片是否符合标注类别,标注位置是否正确等。若第一用户检查出有误,那么可对有误的地方进行修改,并更新第一标注数据集。若第一用户检查所有标注均无误,则第一用户输入确认指令。In the case of generating the first annotated data set, the first user performs a check step to check whether the images marked in the first annotated data set meet the annotation category, whether the annotation position is correct, and so on. If the first user detects an error, he can modify the error and update the first annotation data set. If the first user checks that all the labels are correct, the first user inputs a confirmation instruction.
第二用户可在第一用户检查确认之后,再次对第一用户确认的第一标注数据集进行检查,并在检查无误后,输入确认指令。The second user may check the first annotation data set confirmed by the first user again after the first user has checked and confirmed, and input a confirmation instruction after the check is correct.
数据处理装置在接收到第一用户的确认指令和第二用户的确认指令之后,可执行步骤309。After receiving the confirmation instruction of the first user and the confirmation instruction of the second user, the data processing apparatus may perform step 309.
步骤309,数据处理装置对第一标注数据集进行模型训练,生成第一训练模型。Step 309: The data processing device performs model training on the first labeled data set to generate a first training model.
步骤310,在第一训练模型的准确度信息满足部署条件的情况下,数据处理装置输出第一训练模型。Step 310: When the accuracy information of the first training model meets the deployment condition, the data processing device outputs the first training model.
步骤307、步骤309和步骤310的具体实现过程可参见图2所示实施例中步骤202-步骤204的具体描述,在此不再赘述。For the specific implementation process of step 307, step 309, and step 310, please refer to the specific description of step 202 to step 204 in the embodiment shown in FIG. 2, which will not be repeated here.
若第一训练模型的准确度信息不满足部署条件,则第一用户再次执行步骤303,输入计划创建指令,该计划创建指令用于创建第二计划。第一用户可基于第一训练模型的准确度信息上传第二数据集,数据处理装置生成第二标注数据集,并在第一用户和第二用户确认第二标注数据集之后,数据处理装置将第二数据集输入第一训练模型进行模型训练,生成第二训练模型,并在第二训练模型的准确度信息满足部署条件的情况下,输出第二训练模型。If the accuracy information of the first training model does not meet the deployment conditions, the first user performs step 303 again to input a plan creation instruction, which is used to create a second plan. The first user can upload the second data set based on the accuracy information of the first training model, the data processing device generates the second labeled data set, and after the first user and the second user confirm the second labeled data set, the data processing device will The second data set is input to the first training model for model training to generate a second training model, and when the accuracy information of the second training model meets the deployment conditions, the second training model is output.
图3所示的实施例中,从用户与数据处理装置交互的角度进行介绍,即使第一用户对训练模型不熟悉,也可以控制数据处理装置输出满足部署条件的训练模型,从而第一用户可以自主管理训练模型,实现项目的目的。In the embodiment shown in FIG. 3, the introduction is introduced from the perspective of the interaction between the user and the data processing device. Even if the first user is not familiar with the training model, the data processing device can be controlled to output the training model that meets the deployment conditions, so that the first user can Autonomously manage the training model to achieve the purpose of the project.
上面阐述了本申请实施例提供的数据处理方法,下面将对本申请实施例提供的数据处理装置进行介绍。The data processing method provided in the embodiment of the present application is described above, and the data processing device provided in the embodiment of the present application will be introduced below.
请参阅图6,为本申请实施例提供的一种数据处理装置的结构示意图。该数据处理装置60包括输入单元601、处理单元602和输出单元603。Please refer to FIG. 6, which is a schematic structural diagram of a data processing device provided by an embodiment of this application. The data processing device 60 includes an input unit 601, a processing unit 602, and an output unit 603.
输入单元601,用于获取针对目标项目的第一数据集。The input unit 601 is used to obtain the first data set for the target item.
处理单元602,用于按照目标项目的识别目的生成第一数据集对应的第一标注数据集;对第一标注数据集进行模型训练,生成第一训练模型。The processing unit 602 is configured to generate a first labeled data set corresponding to the first data set according to the identification purpose of the target item; perform model training on the first labeled data set to generate a first training model.
输出单元603,用于在处理单元602确定出第一训练模型的准确度信息满足部署条件的情况下,输出第一训练模型。The output unit 603 is configured to output the first training model when the processing unit 602 determines that the accuracy information of the first training model satisfies the deployment condition.
在一实施方式中,处理单元602,具体用于按照目标项目的识别目的,根据针对第一数据集中的第一数据的标注指令,生成第一数据集对应的第一标注数据集。In one embodiment, the processing unit 602 is specifically configured to generate a first labeled data set corresponding to the first data set according to a labeling instruction for the first data in the first data set according to the identification purpose of the target item.
在一实施方式中,处理单元602,具体用于按照目标项目的识别目的,调用标注模型对第一数据集中的第一数据进行标注处理,生成第一数据集对应的第一标注数据集。In one embodiment, the processing unit 602 is specifically configured to call the annotation model to perform annotation processing on the first data in the first data set according to the identification purpose of the target item, and generate the first annotation data set corresponding to the first data set.
在一实施方式中,处理单元602,还用于统计第一标注数据集中已标注的数据数量、无效数据的数量和标注的对象数量。In one embodiment, the processing unit 602 is further configured to count the number of marked data, the number of invalid data, and the number of marked objects in the first marked data set.
在一实施方式中,处理单元602,具体用于调用目标训练模型对第一标注数据集进行模型训练,生成第一训练模型。In one embodiment, the processing unit 602 is specifically configured to call the target training model to perform model training on the first labeled data set, and generate the first training model.
在一实施方式中,处理单元602,具体用于获取处于空闲态的数据处理服务器,调用数据处理服务器对第一标注数据集进行模型训练,生成第一训练模型。In one embodiment, the processing unit 602 is specifically configured to obtain a data processing server in an idle state, call the data processing server to perform model training on the first labeled data set, and generate a first training model.
在一实施方式中,处理单元602,还用于调用第一训练模型对参考数据集进行测试,得到测试结果;对比测试结果与参考数据集对应的参考标注结果,得到第一训练模型的准确度信息;In one embodiment, the processing unit 602 is further configured to call the first training model to test the reference data set to obtain the test result; compare the test result with the reference annotation result corresponding to the reference data set to obtain the accuracy of the first training model information;
输出单元603,还用于输出第一训练模型的准确度信息。The output unit 603 is also used to output accuracy information of the first training model.
在一实施方式中,准确度信息包括整体准确度和评判项准确度;准确度信息满足部署条件包括:整体准确度大于第一阈值;或,评判项准确度大于第二阈值;或,整体准确度大于第一阈值且评判项准确度大于第二阈值。In an embodiment, the accuracy information includes the overall accuracy and the accuracy of the evaluation item; the accuracy information that satisfies the deployment conditions includes: the overall accuracy is greater than the first threshold; or, the accuracy of the evaluation item is greater than the second threshold; or, the overall accuracy The degree is greater than the first threshold and the accuracy of the judgment item is greater than the second threshold.
在一实施方式中,数据处理装置60还包括存储单元,用于存储第一训练模型和第一训练模型的准确度信息。In an embodiment, the data processing device 60 further includes a storage unit for storing the first training model and the accuracy information of the first training model.
在一实施方式中,输入单元601,还用于在处理单元602确定出第一训练模型的准确度信息不满足部署条件的情况下,根据第一训练模型的准确度信息,获取针对目标项目的第二数据集;In one embodiment, the input unit 601 is further configured to obtain information for the target project according to the accuracy information of the first training model when the processing unit 602 determines that the accuracy information of the first training model does not meet the deployment conditions. The second data set;
处理单元602,还用于按照目标项目的识别目的生成第二数据集对应的第二标注数据集;对第二标注数据集进行模型训练,生成第二训练模型;The processing unit 602 is further configured to generate a second labeled data set corresponding to the second data set according to the identification purpose of the target item; perform model training on the second labeled data set to generate a second training model;
输出单元603,还用于在处理单元602确定出第二训练模型的准确度信息满足部署条件的情况下,输出第二训练模型。The output unit 603 is further configured to output the second training model when the processing unit 602 determines that the accuracy information of the second training model satisfies the deployment condition.
在一实施方式中,处理单元602,具体用于调用第一训练模型,对第二标注数据集进行模型训练,生成第二训练模型。In one embodiment, the processing unit 602 is specifically configured to call the first training model, perform model training on the second labeled data set, and generate the second training model.
请参阅图7,为本申请实施例提供的一种终端设备的结构示意图。本申请实施例中所描述的终端设备包括:处理器701、通信接口702、存储器703。其中,处理器701、通信接口702、存储器703可通过总线或其他方式连接,本申请实施例以通过总线连接为例。Please refer to FIG. 7, which is a schematic structural diagram of a terminal device provided by an embodiment of this application. The terminal device described in the embodiment of the present application includes: a processor 701, a communication interface 702, and a memory 703. Among them, the processor 701, the communication interface 702, and the memory 703 may be connected through a bus or in other ways. The embodiment of the present application takes the connection through a bus as an example.
处理器701可以是中央处理器(Central Processing Unit,CPU),网络处理器(Network Processor,NP),或者CPU和NP的组合。处理器701也可以是多核CPU、或多核NP中用于实现通信标识绑定的核。The processor 701 may be a central processing unit (CPU), a network processor (Network Processor, NP), or a combination of a CPU and NP. The processor 701 may also be a multi-core CPU or a core used to implement communication identification binding in a multi-core NP.
所述处理器701可以是硬件芯片。所述硬件芯片可以是专用集成电路(Application-specific Integrated Circuit,ASIC),可编程逻辑器件(Programmable Logic Device,PLD)或其组合。所述PLD可以是复杂可编程逻辑器件(Complex Programmable Logic Device,CPLD),现场可编程逻辑门阵列(Field-programmable Gate Array,FPGA),通用阵列逻辑(Generic Array Logic,GAL)或其任意组合。The processor 701 may be a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), a Generic Array Logic (GAL) or any combination thereof.
所述通信接口702可用于收发信息或信令的交互,以及信号的接收和传递。所述存储器703可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的存储程序(比如文字存储功能、位置存储功能等);存储数据区可存储根据装置的使用所创建的数据(比如图像数据、文字数据)等,并可以包括应用存储程序等。此外,存储器703可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The communication interface 702 can be used for the interaction of sending and receiving information or signaling, as well as the reception and transmission of signals. The memory 703 may mainly include a storage program area and a storage data area. The storage program area may store an operating system and a stored program required by at least one function (such as a text storage function, a location storage function, etc.); the storage data area may store Data (such as image data, text data) created according to the use of the device, etc., and may include application storage programs, etc. In addition, the memory 703 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
图7所示的终端设备还包括输入输出器件,用于接收用户的输入指令或选择指令等,还用于在用户界面输出准确度信息等。The terminal device shown in FIG. 7 also includes input and output devices, which are used to receive input instructions or selection instructions from the user, and are also used to output accuracy information on the user interface.
所述存储器703还用于存储程序指令。所述处理器701,用于执行所述存储器703存储的程序指令,当所述程序指令被执行时,所述处理器701用于:The memory 703 is also used to store program instructions. The processor 701 is configured to execute program instructions stored in the memory 703, and when the program instructions are executed, the processor 701 is configured to:
获取针对目标项目的第一数据集;Obtain the first data set for the target project;
按照目标项目的识别目的生成第一数据集对应的第一标注数据集;对第一标注数据集进行模型训练,生成第一训练模型;Generate a first labeled data set corresponding to the first data set according to the identification purpose of the target item; perform model training on the first labeled data set to generate a first training model;
在第一训练模型的准确度信息满足部署条件的情况下,控制输入输出器件输出第一训练模型When the accuracy information of the first training model meets the deployment conditions, control the input and output devices to output the first training model
本申请实施例中处理器执行的方法均从处理器的角度来描述,可以理解的是,本申请实施例中处理器要执行上述方法需要其他硬件结构的配合。本申请实施例对具体的实现过程不作详细描述和限制。The methods executed by the processor in the embodiments of the present application are all described from the perspective of the processor. It can be understood that the processor in the embodiments of the present application requires the cooperation of other hardware structures to execute the foregoing methods. The embodiments of the present application do not describe and limit the specific implementation process in detail.
具体实现中,本申请实施例中所描述的处理器701、通信接口702、存储器703可执行本申请实施例提供的一种数据处理方法中所描述的实现方式,在此不再赘述。In specific implementation, the processor 701, the communication interface 702, and the memory 703 described in the embodiment of the present application can execute the implementation manner described in the data processing method provided in the embodiment of the present application, and details are not described herein again.
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现上述方法实施例所述的方法。An embodiment of the present application also provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the method described in the foregoing method embodiment.
本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法实施例所述的方法。The embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the above method embodiment.
需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium, and the storage medium can include: Flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.
以上对本申请实施例所提供的一种数据处理方法及其装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The data processing method and device provided by the embodiments of the application are described in detail above. Specific examples are used in this article to explain the principles and implementations of the application. The descriptions of the above embodiments are only used to help understand the present application. The method of application and its core idea; meanwhile, for those of ordinary skill in the art, according to the idea of this application, there will be changes in the specific implementation and scope of application. In summary, the content of this specification should not be understood It is a restriction on this application.

Claims (16)

  1. 一种数据处理方法,其特征在于,所述方法包括:A data processing method, characterized in that the method includes:
    获取针对目标项目的第一数据集;Obtain the first data set for the target project;
    按照所述目标项目的识别目的生成所述第一数据集对应的第一标注数据集;Generating a first annotation data set corresponding to the first data set according to the identification purpose of the target item;
    对所述第一标注数据集进行模型训练,生成第一训练模型;Performing model training on the first labeled data set to generate a first training model;
    在所述第一训练模型的准确度信息满足部署条件的情况下,输出所述第一训练模型。When the accuracy information of the first training model satisfies the deployment condition, output the first training model.
  2. 根据权利要求1所述的方法,其特征在于,所述按照所述目标项目的识别目的生成所述第一数据集对应的第一标注数据集,包括:The method according to claim 1, wherein the generating a first annotation data set corresponding to the first data set according to the identification purpose of the target item comprises:
    按照所述目标项目的识别目的,根据针对所述第一数据集中的第一数据的标注指令,生成所述第一数据集对应的第一标注数据集。According to the identification purpose of the target item, a first annotation data set corresponding to the first data set is generated according to an annotation instruction for the first data in the first data set.
  3. 根据权利要求1所述的方法,其特征在于,所述按照所述目标项目的识别目的生成所述第一数据集对应的第一标注数据集,包括:The method according to claim 1, wherein the generating a first annotation data set corresponding to the first data set according to the identification purpose of the target item comprises:
    按照所述目标项目的识别目的,调用标注模型对所述第一数据集中的第一数据进行标注处理,生成所述第一数据集对应的第一标注数据集。According to the identification purpose of the target item, a labeling model is invoked to label the first data in the first data set to generate a first labeling data set corresponding to the first data set.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    统计所述第一标注数据集中已标注的数据数量、无效数据的数量和标注的对象数量。Count the number of marked data, the number of invalid data, and the number of marked objects in the first marked data set.
  5. 根据权利要求1-3任一项所述的方法,其特征在于,所述对所述第一标注数据集进行模型训练,生成第一训练模型,包括:The method according to any one of claims 1 to 3, wherein the performing model training on the first labeled data set to generate a first training model comprises:
    调用目标训练模型对所述第一标注数据集进行模型训练,生成第一训练模型。The target training model is called to perform model training on the first labeled data set, and a first training model is generated.
  6. 根据权利要求1-3任一项所述的方法,其特征在于,所述对所述第一标注数据集进行模型训练,生成第一训练模型,包括:The method according to any one of claims 1 to 3, wherein the performing model training on the first labeled data set to generate a first training model comprises:
    获取处于空闲态的数据处理服务器,调用所述数据处理服务器对所述第一标注数据集进行模型训练,生成第一训练模型。Obtain a data processing server in an idle state, call the data processing server to perform model training on the first labeled data set, and generate a first training model.
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    调用所述第一训练模型对参考数据集进行测试,得到测试结果;Call the first training model to test the reference data set, and obtain the test result;
    对比所述测试结果与所述参考数据集对应的参考标注结果,得到并输出所述第一训练模型的准确度信息。The test result is compared with the reference annotation result corresponding to the reference data set to obtain and output the accuracy information of the first training model.
  8. 根据权利要求7所述的方法,其特征在于,所述准确度信息包括整体准确度和评判项准确度;所述准确度信息满足所述部署条件包括:所述整体准确度大于第一阈值;或,所述评判项准确度大于第二阈值;或,所述整体准确度大于第一阈值且所述评判项准确度大于第二阈值。The method according to claim 7, wherein the accuracy information includes overall accuracy and evaluation item accuracy; the accuracy information satisfying the deployment condition includes: the overall accuracy is greater than a first threshold; Or, the accuracy of the evaluation item is greater than the second threshold; or, the overall accuracy is greater than the first threshold and the accuracy of the evaluation item is greater than the second threshold.
  9. 根据权利要求7或8所述的方法,其特征在于,所述方法还包括:The method according to claim 7 or 8, wherein the method further comprises:
    控制所述第一训练模型以及所述第一训练模型的准确度信息的保存。Controlling the storage of the first training model and the accuracy information of the first training model.
  10. 根据权利要求1或7所述的方法,其特征在于,所述方法还包括:The method according to claim 1 or 7, wherein the method further comprises:
    在所述第一训练模型的准确度信息不满足所述部署条件的情况下,根据所述第一训练模型的准确度信息,获取针对所述目标项目的第二数据集;In the case that the accuracy information of the first training model does not satisfy the deployment condition, acquiring a second data set for the target item according to the accuracy information of the first training model;
    按照所述目标项目的识别目的生成所述第二数据集对应的第二标注数据集;Generating a second annotation data set corresponding to the second data set according to the identification purpose of the target item;
    对所述第二标注数据集进行模型训练,生成第二训练模型;Performing model training on the second labeled data set to generate a second training model;
    在所述第二训练模型的准确度信息满足所述部署条件的情况下,输出所述第二训练模型。When the accuracy information of the second training model satisfies the deployment condition, output the second training model.
  11. 根据权利要求10所述的方法,其特征在于,所述对所述第二标注数据集进行模型训练,生成第二训练模型,包括:The method according to claim 10, wherein said performing model training on said second labeled data set to generate a second training model comprises:
    调用所述第一训练模型,对所述第二标注数据集进行模型训练,生成第二训练模型。Invoking the first training model, performing model training on the second labeled data set, and generating a second training model.
  12. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    在所述第一标注数据集满足训练条件的情况下,执行对所述第一标注数据集进行模型训练,生成第一训练模型的步骤。In the case that the first labeled data set meets the training condition, the step of performing model training on the first labeled data set to generate a first training model is performed.
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:The method according to claim 12, wherein the method further comprises:
    在接收到针对所述第一标注数据集的至少一次确认指令的情况下,确定所述第一标注数据集满足所述训练条件。In a case where at least one confirmation instruction for the first labeled data set is received, it is determined that the first labeled data set satisfies the training condition.
  14. 一种数据处理装置,其特征在于,所述数据处理装置包括用于执行如权利要求1-13任一项所述的各个步骤的单元。A data processing device, characterized in that the data processing device includes a unit for executing each step according to any one of claims 1-13.
  15. 一种数据处理装置,其特征在于,包括:存储器和处理器,A data processing device, characterized by comprising: a memory and a processor,
    所述存储器用于存储计算机指令;The memory is used to store computer instructions;
    当所述处理器执行所述计算机指令时,以使所述数据处理装置执行权利要求1-12任一项所述的方法。When the processor executes the computer instructions, the data processing device executes the method according to any one of claims 1-12.
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现如权利要求1至12中任一项所述方法的步骤。A computer-readable storage medium in which a computer program is stored, characterized in that: when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 12 are implemented .
PCT/CN2020/070549 2020-01-06 2020-01-06 Data processing method and apparatus, and computer readable storage medium WO2021138783A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080005484.2A CN112805725A (en) 2020-01-06 2020-01-06 Data processing method and device and computer readable storage medium
PCT/CN2020/070549 WO2021138783A1 (en) 2020-01-06 2020-01-06 Data processing method and apparatus, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/070549 WO2021138783A1 (en) 2020-01-06 2020-01-06 Data processing method and apparatus, and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021138783A1 true WO2021138783A1 (en) 2021-07-15

Family

ID=75809347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/070549 WO2021138783A1 (en) 2020-01-06 2020-01-06 Data processing method and apparatus, and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN112805725A (en)
WO (1) WO2021138783A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875963A (en) * 2018-06-28 2018-11-23 北京字节跳动网络技术有限公司 Optimization method, device, terminal device and the storage medium of machine learning model
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN109242013A (en) * 2018-08-28 2019-01-18 北京九狐时代智能科技有限公司 A kind of data mask method, device, electronic equipment and storage medium
CN109583325A (en) * 2018-11-12 2019-04-05 平安科技(深圳)有限公司 Face samples pictures mask method, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11423324B2 (en) * 2017-02-23 2022-08-23 International Business Machines Corporation Training and estimation of selection behavior of target

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN108875963A (en) * 2018-06-28 2018-11-23 北京字节跳动网络技术有限公司 Optimization method, device, terminal device and the storage medium of machine learning model
CN109242013A (en) * 2018-08-28 2019-01-18 北京九狐时代智能科技有限公司 A kind of data mask method, device, electronic equipment and storage medium
CN109583325A (en) * 2018-11-12 2019-04-05 平安科技(深圳)有限公司 Face samples pictures mask method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112805725A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN111797893B (en) Neural network training method, image classification system and related equipment
US11030458B2 (en) Generating synthetic digital assets for a virtual scene including a model of a real-world object
CN112183577A (en) Training method of semi-supervised learning model, image processing method and equipment
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
KR102548732B1 (en) Apparatus and Method for learning a neural network
CN108983979B (en) Gesture tracking recognition method and device and intelligent equipment
US20170185913A1 (en) System and method for comparing training data with test data
CN105051755A (en) Part and state detection for gesture recognition
WO2021217543A1 (en) Image annotation method, apparatus, device and medium
CN109522970B (en) Image classification method, device and system
WO2021184776A1 (en) Image recognition method and apparatus, computer device and storage medium
KR20190081373A (en) Terminal device and data processing method based on artificial neural network
WO2022247112A1 (en) Task processing method and apparatus, device, storage medium, computer program, and program product
WO2020256732A1 (en) Domain adaptation and fusion using task-irrelevant paired data in sequential form
US20220366244A1 (en) Modeling Human Behavior in Work Environment Using Neural Networks
CN111695443A (en) Intelligent traffic artificial intelligence open platform, method, medium and electronic device
CN111124863B (en) Intelligent device performance testing method and device and intelligent device
CN115658523A (en) Automatic control and test method for human-computer interaction interface and computer equipment
US20210312292A1 (en) Methods and systems for operating applications through user interfaces
WO2021138783A1 (en) Data processing method and apparatus, and computer readable storage medium
KR20230068989A (en) Method and electronic device for performing learning of multi-task model
CN114037889A (en) Image identification method and device, electronic equipment and storage medium
CN117593613B (en) Multitasking learning method and device, storage medium and electronic equipment
CN112529038B (en) Method and device for identifying main board material and storage medium
WO2024088031A1 (en) Data acquisition method and apparatus, and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911566

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911566

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20911566

Country of ref document: EP

Kind code of ref document: A1