CN113204614A

CN113204614A - Model training method, method and device for optimizing training data set

Info

Publication number: CN113204614A
Application number: CN202110476915.XA
Authority: CN
Inventors: 王述; 冯知凡; 柴春光; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-08-03
Anticipated expiration: 2041-04-29
Also published as: CN113204614B

Abstract

The disclosure provides a model training method, a method for optimizing a training data set and a device thereof, and relates to the field of artificial intelligence, in particular to the fields of deep learning and knowledge maps. The specific implementation scheme is as follows: training the model based on a first training data set containing labeling information; determining a prediction of training data in the first training data set using the trained model; determining the training data as at least a first portion of a second training data set if the prediction results are different from corresponding labeling information of the training data, the second training data set being different from the first training data set; training the model based on the second training data set. In this way, the technical scheme of the present disclosure can optimize the sample data of the next round of model training according to the problems occurring in model prediction, thereby improving the model effect.

Description

Model training method, method and device for optimizing training data set

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of machine learning, in particular to a model training method, a method of optimizing a training data set, and apparatuses, electronic devices, computer-readable storage media and computer program products thereof.

Background

In the process of model training, effective training data needs to be selected from a large amount of training data for model training, so as to avoid situations such as entity class imbalance, and thus improve the performance of the model. However, the amount of training data in the training data set is too large, and the quality of the training data is uneven, so that objective labor cost is required to optimize the selection of the training data, and a worker is required to have a professional knowledge level.

Disclosure of Invention

The present disclosure provides a model training method, a method of optimizing a training data set, and apparatuses, electronic devices, computer-readable storage media, and computer program products thereof.

According to a first aspect of the present disclosure, a model training method is provided. The method may include training the model based on a first training data set containing annotation information. Further, a prediction of the training data in the first set of training data may be determined using the trained model. The method may further include determining the training data as at least a first portion of a second training data set if the prediction results are different from corresponding labeling information for the training data, the second training data set being different from the first training data set. Moreover, the method may further comprise training the model based on a second training data set.

According to a second aspect of the present disclosure, a method of optimizing a training data set is provided, which may comprise determining a prediction result for training data in a first training data set of a training model using a trained model. Furthermore, the method may further comprise determining the training data as at least a first part of a second training data set if the prediction result is different from the corresponding annotation information of the training data, the second training data set being different from the first training data set and being used for further training the model.

In a third aspect of the present disclosure, there is provided a model training apparatus comprising: a first model training module configured to train the model based on a first training data set containing labeling information; a prediction result determination module configured to determine a prediction result of training data in the first training data set using the trained model; a first training data set determination module configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set; a second model training module configured to train the model based on the second training data set.

In a fourth aspect of the present disclosure, there is provided an apparatus for optimizing a training data set, comprising: a prediction result determination module configured to determine a prediction result of training data in a first training data set used to train the model using the trained model; and a first training data set determination module configured to determine the training data as at least a first part of a second training data set if the prediction result is different from the corresponding labeling information of the training data, the second training data set being different from the first training data set and being used for further training the model.

In a fifth aspect of the present disclosure, there is provided an electronic device comprising one or more processors; and storage means for storing the one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method according to the first or second aspect of the disclosure.

In a sixth aspect of the present disclosure, a computer readable storage medium is provided, on which a computer program is stored, which program, when executed by a processor, implements the method according to the first, second aspect of the present disclosure.

In a seventh aspect of the present disclosure, a computer program product is provided, which computer program, when executed by a processor, implements the method according to the first, second aspect of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented;

FIG. 2 shows a flow diagram of a process of model training according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a detailed process of model training according to an embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of a main architecture for training an entity recognition model according to an embodiment of the present disclosure.

FIG. 5 shows a flow diagram of a process of optimizing a training data set according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a model training apparatus according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an apparatus for optimizing a training data set according to an embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.

It will be appreciated that model training is typically performed with problems that result in the trained model not performing as well as due to the low quality of the training data set (such as the presence of noise). In order to solve this problem, the traditional approach is to perform data cleaning and labeling on the training data set by a pure manual labeling approach. For example, when training an entity recognition model, if it is found that the trained model cannot meet the performance requirement, a worker needs to rely on professional knowledge to clean, screen, or even re-label the training data set, so as to adjust the entity class distribution. The optimized data set may continue to be used to train the entity recognition model if the performance of the model still fails to meet the performance requirements. The sample data needs to be manually adjusted by continuously executing the operations until the model effect reaches the standard. Therefore, considerable labor cost is consumed in the model training process, and the corresponding professional knowledge is needed in each field, so that the model training process cannot perform field migration.

Therefore, the model training method can optimize the training data set in the training process, and the training data set is used for final model training, so that the effect of model training can be improved on the premise of not depending on manual labeling. In addition, the present disclosure also provides a method of optimizing a training data set.

According to an embodiment of the present disclosure, a model training scheme is presented. In this approach, model training may be performed based on a labeled training data set, and in the event that it is determined that the performance of the trained model is not met, the model is utilized to determine a prediction result for each training data of the training data set. If training data exists whose prediction results are different from the corresponding annotation information, the training data is collected into an enhanced training data set. The enhanced training data set may further include training data with a prediction result at a threshold boundary and a small amount of training data with a good prediction result. The model may be further trained using the enhanced training data set by forming the enhanced training data set. In this way, efficient and accurate model training is achieved.

Corresponding to the model training method, the disclosure also provides a method for optimizing the training data set. For example, a trained model may be utilized to determine a prediction of training data in a training data set used to train the model. If the prediction results are different from the corresponding annotation information for the training data, the training data may be collected into an enhanced training data set. The enhanced training data set may be used to further train the model. In this way, optimization of the training data set can be achieved in a manner that does not rely on manual labeling.

Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings. Fig. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented. As shown in FIG. 1, the present disclosure illustrates the manner in which models are trained and applied in the context of entity recognition models. The example environment 100 includes input text to be recognized 110, a computing device 120, and a recognition result 130 determined via the computing device 120.

In some embodiments, the text to be recognized 110 entered by the user may be any text string. By identifying the named entities in the text string, a plurality of natural language processing tasks such as information extraction, question-answering system, syntactic analysis, machine translation and the like can be further realized. Based on the processing of the computing device 120, the recognition result 130 of the text to be recognized 110 may be determined, for example, the recognition result of the text "zhangsan singing AAA" is "zhangsan (name of person)" and "AAA (name of song)" (zhangsan may be the name of a singer, and AAA may be the name of a song).

In some embodiments, computing device 120 may include, but is not limited to, a personal computer, a server computer, a handheld or laptop device, a mobile device (such as a mobile phone, a personal digital assistant, PDA, a media player, etc.), a consumer electronics product, a minicomputer, a mainframe computer, a cloud computing resource, and the like.

The training and use of the model in the computing device 120 will be described below in terms of a machine learning model. As shown in FIG. 1, example environment 100 may generally include a model training system 160 and a model application system 170. By way of example, model training system 160 and/or model application system 170 may be implemented in computing device 120 as shown in FIG. 1. It should be understood that the description of the structure and functionality of the example environment 100 is for illustrative purposes only and is not intended to limit the scope of the subject matter described herein. The subject matter described herein may be implemented in various structures and/or functions.

As described above, the process of determining the recognition result 130 of the text to be recognized 110 may be divided into two stages: a model training phase and a model application phase. By way of example, in a model training phase, model training system 160 may utilize training data set 150 to train recognition model 140 for implementing named entity recognition. It should be understood that the training data set 150 may be a combination of a plurality of reference feature data (as inputs to the model 140) and corresponding reference annotation information (as outputs to the model 140). In the model application phase, the model application system 170 may receive the trained recognition model 140, such that the recognition result 130 is determined by the recognition model 140 based on the text to be recognized 110.

In other embodiments, the recognition model 140 may be constructed as a learning network. In some embodiments, the learning network may include a plurality of networks, where each network may be a multi-layer neural network, which may be composed of a large number of neurons. Through the training process, respective parameters of the neurons in each network can be determined. The parameters of the neurons in these networks are collectively referred to as parameters of the recognition model 140.

The training process of the recognition model 140 may be performed in an iterative manner until at least part of the parameters of the recognition model 140 converge or until a predetermined number of iterations is reached, thereby obtaining final model parameters.

The technical solutions described above are only used for illustration and do not limit the invention. It should be understood that the various networks may also be arranged in other ways and connections. To more clearly explain the principles of the disclosed solution, the process of model training will be described in more detail below with reference to fig. 2.

FIG. 2 shows a flow diagram of a process 200 of model training according to an embodiment of the present disclosure. In some embodiments, process 200 may be implemented in computing device 120 of FIG. 1. Referring now to FIG. 2 in conjunction with FIG. 1, a process 200 for training a model according to an embodiment of the present disclosure is described. For ease of understanding, the specific examples set forth in the following description are intended to be illustrative, and are not intended to limit the scope of the disclosure.

At 202, the computing device 120 may train the model based on the first training data set containing the annotation information. As described above, the model may be a recognition model 140 for text entity recognition. In some embodiments, to train the recognition model 140, the computing device 120 may apply a first training data set to the recognition model 140 to be trained to determine the converged parameters of the recognition model 140. In this way, training of the model may be achieved initially. If the model's performance is up to standard, the model may be directly output for text entity recognition.

At 204, the computing device 120 may determine a prediction of the training data in the first training data set using the trained recognition model 140. To describe embodiments of the present disclosure in more detail, the process is now described in conjunction with fig. 3. FIG. 3 shows a flowchart of a detailed process 300 of model training according to an embodiment of the present disclosure. In some embodiments, process 300 may be implemented in computing device 120 of FIG. 1. A detailed process 300 for training a model according to an embodiment of the present disclosure is now described with reference to fig. 3 in conjunction with fig. 1. For ease of understanding, the specific examples set forth in the following description are intended to be illustrative, and are not intended to limit the scope of the disclosure.

At 302, the computing device 120 may determine an effect parameter of the trained recognition model 140, that is, may evaluate the effect of the recognition model 140. In some embodiments, the effectiveness evaluation may be by evaluating the accuracy, recall, or the like of the recognition model 140. As an example, in the labeled training data "singing three AAA", the entity type of "singing three" is labeled as "singer", and the entity type of "AAA" is labeled as "song". In the model training process, after training data "zhangsan singing AAA" is input into the recognition model 140, if the entity type of "zhangsan" is "actor" and the entity type of "AAA" is "song", the entity type of "zhangsan" is a wrong prediction and the entity type "singer" is not predicted, so the prediction result of "zhangsan" is not recalled. Thus, the computing device 120 may determine recall for each training data one by one and count recall as an effectiveness parameter.

At 304, the computing device 120 may compare the determined effect parameter to a predetermined effect. For example, the determined recall rate may be compared to a threshold recall rate. When it is determined that the effect is up to the standard, proceeding to 306, the computing device 120 may output the trained model. When it is determined that the effect does not meet the target, 308 is entered. At 308, the computing device 120 may apply the training data in the first training data set to the trained recognition model 140 to determine the predicted outcome. In this way, it is possible to determine whether to automatically perform the optimization work of the training data by evaluating the model effect.

Returning to 206, the computing device 120 may compare the determined prediction results to corresponding annotation information for the training data. If the prediction results differ from the corresponding annotation information, the computing device 120 may determine the training data as at least a first portion of the second set of training data. It is to be understood that the second training data set is different from the first training data set described above. That is, the computing device 120 may add the mispredicted training data to the second training data set.

In some embodiments, in order to enrich the samples of the second training data set, further training data may be added in the second training data set. As an example, the computing device 120 may determine a portion of the training data in the training data of the first training data set that is the same as the corresponding annotation information as a second portion of the second training data set. It will be appreciated that this second portion is different from the first portion described above. That is, computing device 120 may select a small amount of training data from a large amount of predicted correct training data and add these training data to the second set of training data.

As another example, where the corresponding annotation information of the training data is used to indicate a range within which the prediction should fall, the computing device 120 may determine whether the prediction of the training data is at a boundary of the range, or, stated differently, whether the prediction is equal to a threshold of the range. If the prediction result is equal to the threshold value of the range indicated by the corresponding annotation information of the training data, the computing device 120 may determine the training data as a third portion of the second training data set. It will be appreciated that the third portion is different from the first portion and different from the second portion. That is, the computing device 120 may add training data with "hard" predictions to the second training data set for targeted model training.

By recreating the second training data set described above, at 208, the computing device 120 may train the recognition model 140 based on the second training data set.

Through the embodiment, the model training method is provided, and the step of manual labeling is removed in the model training process through improving the learning framework. In this way, this disclosure can optimize the sample data of next round of model training according to the problem that the model prediction appears to the model effect has been promoted. Moreover, because the step of manual labeling is eliminated, the labor cost can be more effectively reduced, and the method has better expansion and migration capabilities.

In order to more clearly show the technical solution of the present disclosure, a model training architecture according to one embodiment of the present disclosure will be described below with reference to fig. 4. FIG. 4 shows a schematic block diagram of a main architecture 400 for training an entity recognition model according to an embodiment of the present disclosure. It should be understood that the embodiments of the present disclosure are exemplary and that the entity recognition model may be replaced by any other learning model.

As shown in fig. 4, the training data 410 may be a training data set having a large number of training samples and corresponding label information. To complete model training, training data 410 is input into computing device 420. The computing device 420 contains a plurality of units for model training, for example, a model training unit 421, an effect evaluation unit 422, a training data screening unit 423, and optimized training data 424.

In the model training unit 421, a model training process may be performed based on the training data 410, so that a corresponding recognition model may be trained. Thereafter, the effect evaluation unit 422 may perform effect evaluation on the trained model. When the evaluation effect is not met, the training data screening unit 423 may re-input the training data 410 into the trained recognition model, and determine the prediction result of each training data one by one. The training data screening unit 423 picks out training data with a wrong prediction, training data with a prediction result at a boundary, and a small amount of training data with a correct prediction from all the results, and determines these training data as optimized training data 424. Based on the optimized training data 424, the model training unit 421 may further perform the model training process, and the other units may also continue to perform the above process until the model effect reaches the standard. When the model effect is met, the computing device 420 may output the entity recognition model 430 for text entity recognition.

In addition, for the model training mode proposed by the present disclosure, the optimization process of the training data set in the training mode will be described in detail below. Fig. 5 shows a flow diagram of a process 500 of optimizing a training data set according to an embodiment of the present disclosure.

As shown in fig. 5, at 502, computing device 120 may determine a prediction result for training data in a first training data set used to train a model using the trained model. In particular, the computing device 120 may evaluate the effectiveness of the model. Thereafter, at 504, the computing device 120 may compare the determined prediction results with corresponding annotation information of the training data. If the prediction results differ from the corresponding annotation information, the computing device 120 may determine the training data as at least a first portion of the second set of training data. That is, the computing device 120 may add the mispredicted training data to the second training data set. In this way, the training data can be screened and optimized on the premise of not needing manual labeling, and therefore the efficiency of model training is improved.

In some embodiments, in order to enrich the samples of the second training data set, further training data may be added in the second training data set. As an example, the computing device 120 may determine a portion of the training data in the training data of the first training data set that is the same as the corresponding annotation information as a second portion of the second training data set. That is, computing device 120 may select a small amount of training data from a large amount of predicted correct training data and add these training data to the second set of training data.

As another example, where the corresponding annotation information of the training data is used to indicate a range within which the prediction should fall, the computing device 120 may determine whether the prediction of the training data is at a boundary of the range, or, stated differently, whether the prediction is equal to a threshold of the range. If the prediction result is equal to the threshold value of the range indicated by the corresponding annotation information of the training data, the computing device 120 may determine the training data as a third portion of the second training data set. That is, the computing device 120 may add training data with "hard" predictions to the second training data set for targeted model training.

Fig. 6 illustrates a block diagram of an apparatus 600 for model training in accordance with an embodiment of the present disclosure. As shown in fig. 6, the apparatus 600 may include: a first model training module 602 configured to train the model based on a first training data set containing labeling information; a prediction result determination module 604 configured to determine a prediction result of training data in the first training data set using the trained model; a first training data set determination module 606 configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set; a second model training module 608 configured to train the model based on the second training data set.

In an embodiment of the present disclosure, the model is an entity recognition model.

In an embodiment of the present disclosure, the prediction result determining module 604 includes: an effect parameter determination module configured to determine an effect parameter of the trained model; and a decision module configured to apply training data of the first set of training data to the trained model to determine the prediction result if the determined effect parameter does not comply with a predetermined effect.

In an embodiment of the present disclosure, the apparatus 600 further includes: a second training data set determination module configured to determine a part of the training data in the training data set having the same prediction result as the corresponding label information as a second part of the second training data set, the second part being different from the first part.

In an embodiment of the present disclosure, the corresponding labeling information of the training data is used to indicate the range that the prediction result should fall into, and the apparatus 600 further comprises: a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a critical value of the range indicated by the corresponding label information of the training data, the third portion being different from the first portion.

In an embodiment of the present disclosure, the first model training module 602 is further configured to: applying the first training data set to the model to be trained to determine parameters of convergence of the model.

Fig. 7 shows a block diagram of an apparatus 700 for optimizing a training data set according to an embodiment of the present disclosure. As shown in fig. 7, the apparatus 700 may include: a prediction result determination module 702 configured to determine a prediction result of training data in a first training data set used to train the model using the trained model; and a first training data set determination module 704 configured to determine the training data as at least a first part of a second training data set if the prediction result is different from the corresponding label information of the training data, the second training data set being different from the first training data set and being used for further training the model.

In an embodiment of the present disclosure, the apparatus 700 further comprises: a second training data set determination module configured to determine a part of the training data in the training data set having the same prediction result as the corresponding label information as a second part of the second training data set, the second part being different from the first part.

In an embodiment of the present disclosure, the corresponding labeling information of the training data is used to indicate the range that the prediction result should fall into, and the apparatus 700 further comprises: a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a critical value of the range indicated by the corresponding label information of the training data, the third portion being different from the first portion.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 8 illustrates a block diagram of a computing device 800 capable of implementing multiple embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 801 performs the various methods and processes described above, such as the

processes

200, 300, 500. For example, in some embodiments, the

processes

200, 300, 500 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, may perform one or more of the steps of

processes

200, 300, 500 described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the

processes

200, 300, 500 in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A model training method, comprising:

training the model based on a first training data set containing labeling information;

determining a prediction of training data in the first training data set using the trained model;

determining the training data as at least a first portion of a second training data set if the prediction results are different from corresponding labeling information of the training data, the second training data set being different from the first training data set;

training the model based on the second training data set.

2. The method of claim 1, wherein the model is an entity recognition model.

3. The method of claim 1, wherein determining the prediction outcome comprises:

determining an effect parameter of the trained model; and

applying training data in the first training data set to the trained model to determine the prediction result if the determined effect parameter does not correspond to a predetermined effect.

4. The method of claim 1, further comprising:

and determining part of the training data in the training data with the same prediction result as the corresponding marking information in the first training data set as a second part of the second training data set, wherein the second part is different from the first part.

5. The method of claim 1, wherein the respective label information of the training data is used to indicate a range within which the prediction should fall, and the method further comprises:

determining the training data as a third portion of the second training data set if the prediction result is equal to the critical value of the range indicated by the corresponding label information of the training data, the third portion being different from the first portion.

6. The method of claim 1, wherein training the model based on the first training data set comprises:

applying the first training data set to the model to be trained to determine parameters of convergence of the model.

7. A method of optimizing a training data set, comprising:

determining, using the trained model, a prediction of training data in a first set of training data used to train the model; and

determining the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set and used to further train the model.

8. The method of claim 7, further comprising:

9. The method of claim 7, wherein the respective label information of the training data is used to indicate a range within which the prediction should fall, and the method further comprises:

10. A model training apparatus comprising:

a first model training module configured to train the model based on a first training data set containing labeling information;

a prediction result determination module configured to determine a prediction result of training data in the first training data set using the trained model;

a first training data set determination module configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set;

a second model training module configured to train the model based on the second training data set.

11. The apparatus of claim 10, wherein the model is an entity recognition model.

12. The apparatus of claim 10, wherein the prediction result determination module comprises:

an effect parameter determination module configured to determine an effect parameter of the trained model; and

a decision module configured to apply training data of the first set of training data to the trained model to determine the prediction result if the determined effect parameter does not conform to a predetermined effect.

13. The apparatus of claim 10, further comprising:

a second training data set determination module configured to determine a part of the training data in the training data set having the same prediction result as the corresponding label information as a second part of the second training data set, the second part being different from the first part.

14. The apparatus of claim 10, wherein the respective label information of the training data is used to indicate a range within which the prediction result should fall, and the apparatus further comprises:

a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a critical value of the range indicated by the corresponding label information of the training data, the third portion being different from the first portion.

15. The apparatus of claim 10, wherein the first model training module is further configured to:

16. An apparatus for optimizing a training data set, comprising:

a prediction result determination module configured to determine a prediction result of training data in a first training data set used to train the model using the trained model; and

a first training data set determination module configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set and being for further training the model.

17. The apparatus of claim 16, further comprising:

18. The apparatus of claim 16, wherein the respective label information of the training data is used to indicate a range within which the prediction should fall, and the apparatus further comprises:

19. An electronic device, the electronic device comprising:

one or more processors; and

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1-9.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.