CN111783881A

CN111783881A - Scene adaptation learning method and system based on pre-training model

Info

Publication number: CN111783881A
Application number: CN202010621371.7A
Authority: CN
Inventors: 薛贵荣
Original assignee: Shanghai Tianran Intelligent Technology Co ltd
Current assignee: Shanghai Tianran Intelligent Technology Co ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2020-10-16

Abstract

The invention provides a scene adaptation learning method and a scene adaptation learning system based on a pre-training model, which comprise the following steps: scene access step: accessing actual data of the scene according to actual business requirements of the scene; scene configuration: template configuration of scene data is rapidly carried out through the accessed scene data; a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed; model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved. According to the invention, through the pre-trained model and the generated data of the current scene, the high-quality model under the scene can be rapidly obtained, and meanwhile, because of the pre-trained model, the training time can be greatly saved.

Description

Scene adaptation learning method and system based on pre-training model

Technical Field

The invention relates to the field of computer artificial intelligence, in particular to a scene adaptation learning method and a scene adaptation learning system based on a pre-training model. And more particularly, to techniques for how an algorithmic model adapts and optimizes under application scenario changes.

Background

At present, deep learning techniques have made significant progress in the fields of computer image recognition, speech recognition, natural language processing, etc., and have reached a commercially viable stage.

The general learning process of the deep learning algorithm is as follows: firstly, a large number of training samples need to be marked; and then, training and learning the labeled data by calling a training system to obtain a model. And performing machine learning tasks such as identification and prediction by using the learned model in the next application scene.

One of the primary conditions for deep learning is that a large amount of sufficiently labeled data must be provided to the training system for the training system to learn.

In a real scene, it is often very challenging to acquire a large amount of annotation data. It is possible that we can only obtain very little annotation data because of many factors such as privacy and security. In such cases, it becomes very challenging to train a high quality model, and it is very important to provide a high quality machine learning model for the learning scenario.

The invention aims to solve the following problems: under the condition of a scene with rare data or poor scene effect, based on the pre-training model and actual scene data, two modules are generated by using scene configuration and data to generate a large amount of scene related data, the pre-training model is further trained, and finally the optimization model under the scene is obtained. The scene optimization model can be applied to scenes such as OCR (optical character recognition), image detection, image recognition and the like.

Patent document CN109597943A (application number: 201811539961.4) discloses a scene-based learning content recommendation method and a learning apparatus, the method including: controlling a camera device in the learning equipment to shoot a current scene image of a user; identifying current scene information by analyzing a current scene image; acquiring a scene label corresponding to current scene information; searching a target knowledge point type label which is matched with the scene label and has the highest user evaluation grade from a plurality of preset knowledge point type labels, wherein one knowledge point type label corresponds to one user evaluation grade, and the user evaluation grade is in direct proportion to the evaluation accuracy rate of the user on the learning content corresponding to the knowledge point type label; extracting target learning content matched with the target knowledge point type label from a preset database; and recommending the target learning content to the user.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a scene adaptation learning method and system based on a pre-training model.

The invention provides a scene adaptation learning method based on a pre-training model, which is characterized by comprising the following steps:

scene access step: accessing actual scene data according to actual business requirements of the scene;

scene configuration: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;

a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;

model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.

Preferably, the actual data of the scene refers to a sample of real recognition scene data, including: license plate, picture and license plate.

Preferably, the scene configuring step includes:

and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.

Data configuration: and providing a labeled object in cooperation with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output.

Preferably, the data configuration requires completing data generation configuration in three dimensions:

item: item is the minimum unit of information processing, and one item can contain one or more label layers;

labeling the layer: the label layer is represented as a frame drawn by the picture area picture;

layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.

Preferably, the data generating step includes:

material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;

generating special effects: and carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured.

Preferably, the material generation supports a font library, a corpus and a picture library to enrich the generated data format;

font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;

corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;

a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;

the special effects include: illumination, 3d transformation, relief, sharpening, median blur, and smoothing effects.

Preferably, the model training step:

training through a prefabricated OCR algorithm and an image recognition algorithm;

the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.

The invention provides a scene adaptive learning system based on a pre-training model, which comprises:

a scene access module: accessing actual scene data according to actual business requirements of the scene;

a scene configuration module: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;

a data generation module: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;

a model training module: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.

Preferably, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate;

the scene configuration module comprises:

Data configuration: providing a labeled object by matching with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output;

the data configuration needs to complete data generation configuration of three dimensions:

Preferably, the data generation module includes:

generating special effects: carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured;

the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;

the special effects include: illumination, 3d transformation, relief, sharpening, median blurring, and smoothing effects;

the model training module:

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, through the pre-trained model and the generated data of the current scene, the high-quality model under the scene can be rapidly obtained, and meanwhile, based on the pre-trained model, the training time is greatly saved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic flow diagram of a scene adaptive learning system based on a pre-training model according to the present invention.

Fig. 2 is a schematic view of a scene configuration provided by the present invention.

Fig. 3 is a schematic diagram of the generated data provided by the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

scene configuration: template configuration of scene data is rapidly carried out through the accessed actual scene data;

Specifically, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate.

Specifically, the scene configuration step includes:

Specifically, the data configuration needs to complete data generation configuration in three dimensions:

Specifically, the data generating step includes:

Specifically, the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;

Specifically, the model training step:

The scene adaptive learning system based on the pre-training model can be realized through the step flow of the scene adaptive learning method based on the pre-training model. The person skilled in the art may understand the method for learning scene adaptation based on a pre-trained model as a preferred example of the system for learning scene adaptation based on a pre-trained model.

Specifically, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate;

the scene configuration module comprises:

Specifically, the data generation module includes:

the model training module:

The present invention will be described more specifically below with reference to preferred examples.

Preferred example 1:

as shown in fig. 1, the technology includes four main modules, namely, scene access, scene configuration, data generation, and model training.

1. Scene access

According to the actual business requirements of a scene, 10-20 pieces of actual data of the scene are accessed, such as a certificate, a picture, a license plate and other real identification scene data samples.

2. Scene configuration

The user can rapidly perform template configuration of the scene data through the accessed scene data. Scene configuration requires two aspects of work: area selection and data configuration.

The main functions of the area selection are as follows: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded. As shown in fig. 2 below.

The data configuration is mainly matched with the next new training data to provide a labeled object, the basic attributes of data required to be generated in the selected object are defined, the basic attributes comprise data content, data styles, data effects and the like, and meanwhile, the relevant formats and contents of standard answers when the overall effect of the template is exerted and the definition output is supported can be defined. Three dimensions of data generation configuration need to be completed.

Item: the item is the minimum unit of information processing, and one item can contain one or more annotation layers.

Labeling the layer: directly showing as the frame drawn by the picture of the picture area.

Layer information: the layer information comprises information definitions such as item types and auxiliary marks. And further defining the layer attribute information according to the layer type.

Through the scene configuration, the system can know which kind of data needs to be generated in which area. Such as: an 18-digit numeric character string is generated in the number area of the ID card, a character string with national address information is generated in the address area, and a capital character string of the amount is generated in the capital amount area of the invoice.

After the template configuration preparation is completed, data generation can be carried out, and data enhancement and training data preparation with labeled information are completed.

3. Generating data

From a given template, we next generate data.

Data generation includes two parts: material generation and special effect generation.

The material is mainly to fill and write characters, numbers, pictures and other information with proper formats in a configured area. The special effect generation is to perform special effect processing aiming at the problems of picture deformation, folding, illumination, shooting blurring and the like.

Material generation: the part is to generate characters, numbers, pictures and the like in a frame configured by a previous template. The purpose of material generation is to construct multi-scene scenes such as characters, numbers, pictures and the like which need to be identified and analyzed.

The rules of the data set filled in the frame selection content can be defined, and the data set comprises text, numbers, pictures and other types of data for selection. And advanced usages such as compound enumeration are provided to support the user to carry out combined connection on the data.

The materials of the data generation system support a font library, a corpus and a picture library, and the generated data format is enriched and is close to the data of a real scene.

Font library: and a large number of basic fonts are provided for users to adjust the fonts of the template contents according to actual data styles, and a real-time preview function is provided for the users to select the proper fonts conveniently.

Corpus: a large amount of text basic corpora are provided for a user to fill the template content according to actual needs, and meanwhile, the user-defined uploading of the required corpora is supported, so that more use scenes are flexibly met.

A picture library: a large amount of text basic picture materials are provided for a user to fill the template content according to actual needs, and meanwhile, the user-defined uploading of the required pictures is supported, so that more use scenes are flexibly met.

Through the configuration of characters, linguistic data and pictures and the basic processing of the images, a large number of pictures related to the current scene can be produced, and meanwhile, the marking information corresponding to the pictures is provided. A schematic diagram of the generated data is shown in fig. 3 below.

Generating special effects: the special effect generation is to perform special effect processing on a picture. The special effects comprise effects of illumination, 3d transformation, relief, sharpening, median blurring, smoothing and the like. Due to the special effect, the generated data can support various real scene fitting conditions as far as possible, and the diversity of model training data samples is ensured.

4. Model training system

And the model training function is used for continuing training through an OCR algorithm and an image recognition algorithm which are prefabricated in the platform.

Pre-training model

A large number of images on the platform are matured into an image processing algorithm and are prefabricated in a re-system. The user trains again based on the high-quality maturity module, greatly shortening the algorithm landing time. Such as a preset YOLO image classification/segmentation model, a text detection model, a text recognition model, etc.

Model training system

Through the pre-trained model, the model can be further trained by combining the generated data in the scene, and the accuracy of the pre-trained model is improved.

In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A scene adaptation learning method based on a pre-training model is characterized by comprising the following steps:

2. The method of claim 1, wherein the actual scene data refers to a sample of real recognition scene data, and comprises: license plate, picture and license plate.

3. The method according to claim 1, wherein the scene configuration step comprises:

4. The pre-training model-based scene adaptation learning method according to claim 3, wherein the data configuration requires three-dimensional data generation configuration:

5. The method according to claim 1, wherein the data generating step comprises:

6. The method of claim 5, wherein the material generation supports a font library, a corpus and a picture library to enrich the generated data format;

7. The scene adaptive learning method based on the pre-training model as claimed in claim 1, wherein the model training step comprises:

8. A scene adaptation learning system based on a pre-training model is characterized by comprising:

9. The pre-trained model based scene adaptation learning system according to claim 8, wherein the actual scene data refers to a sample of real recognition scene data, and comprises: license plate, picture and license plate;

the scene configuration module comprises:

10. The pre-trained model based scene adaptive learning system of claim 1, wherein the data generation module comprises:

the model training module: