CN111783881A - Scene adaptation learning method and system based on pre-training model - Google Patents

Scene adaptation learning method and system based on pre-training model Download PDF

Info

Publication number
CN111783881A
CN111783881A CN202010621371.7A CN202010621371A CN111783881A CN 111783881 A CN111783881 A CN 111783881A CN 202010621371 A CN202010621371 A CN 202010621371A CN 111783881 A CN111783881 A CN 111783881A
Authority
CN
China
Prior art keywords
data
scene
training
model
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010621371.7A
Other languages
Chinese (zh)
Inventor
薛贵荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tianran Intelligent Technology Co ltd
Original Assignee
Shanghai Tianran Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tianran Intelligent Technology Co ltd filed Critical Shanghai Tianran Intelligent Technology Co ltd
Priority to CN202010621371.7A priority Critical patent/CN111783881A/en
Publication of CN111783881A publication Critical patent/CN111783881A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a scene adaptation learning method and a scene adaptation learning system based on a pre-training model, which comprise the following steps: scene access step: accessing actual data of the scene according to actual business requirements of the scene; scene configuration: template configuration of scene data is rapidly carried out through the accessed scene data; a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed; model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved. According to the invention, through the pre-trained model and the generated data of the current scene, the high-quality model under the scene can be rapidly obtained, and meanwhile, because of the pre-trained model, the training time can be greatly saved.

Description

Scene adaptation learning method and system based on pre-training model
Technical Field
The invention relates to the field of computer artificial intelligence, in particular to a scene adaptation learning method and a scene adaptation learning system based on a pre-training model. And more particularly, to techniques for how an algorithmic model adapts and optimizes under application scenario changes.
Background
At present, deep learning techniques have made significant progress in the fields of computer image recognition, speech recognition, natural language processing, etc., and have reached a commercially viable stage.
The general learning process of the deep learning algorithm is as follows: firstly, a large number of training samples need to be marked; and then, training and learning the labeled data by calling a training system to obtain a model. And performing machine learning tasks such as identification and prediction by using the learned model in the next application scene.
One of the primary conditions for deep learning is that a large amount of sufficiently labeled data must be provided to the training system for the training system to learn.
In a real scene, it is often very challenging to acquire a large amount of annotation data. It is possible that we can only obtain very little annotation data because of many factors such as privacy and security. In such cases, it becomes very challenging to train a high quality model, and it is very important to provide a high quality machine learning model for the learning scenario.
The invention aims to solve the following problems: under the condition of a scene with rare data or poor scene effect, based on the pre-training model and actual scene data, two modules are generated by using scene configuration and data to generate a large amount of scene related data, the pre-training model is further trained, and finally the optimization model under the scene is obtained. The scene optimization model can be applied to scenes such as OCR (optical character recognition), image detection, image recognition and the like.
Patent document CN109597943A (application number: 201811539961.4) discloses a scene-based learning content recommendation method and a learning apparatus, the method including: controlling a camera device in the learning equipment to shoot a current scene image of a user; identifying current scene information by analyzing a current scene image; acquiring a scene label corresponding to current scene information; searching a target knowledge point type label which is matched with the scene label and has the highest user evaluation grade from a plurality of preset knowledge point type labels, wherein one knowledge point type label corresponds to one user evaluation grade, and the user evaluation grade is in direct proportion to the evaluation accuracy rate of the user on the learning content corresponding to the knowledge point type label; extracting target learning content matched with the target knowledge point type label from a preset database; and recommending the target learning content to the user.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a scene adaptation learning method and system based on a pre-training model.
The invention provides a scene adaptation learning method based on a pre-training model, which is characterized by comprising the following steps:
scene access step: accessing actual scene data according to actual business requirements of the scene;
scene configuration: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;
a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
Preferably, the actual data of the scene refers to a sample of real recognition scene data, including: license plate, picture and license plate.
Preferably, the scene configuring step includes:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: and providing a labeled object in cooperation with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output.
Preferably, the data configuration requires completing data generation configuration in three dimensions:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
Preferably, the data generating step includes:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: and carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured.
Preferably, the material generation supports a font library, a corpus and a picture library to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blur, and smoothing effects.
Preferably, the model training step:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
The invention provides a scene adaptive learning system based on a pre-training model, which comprises:
a scene access module: accessing actual scene data according to actual business requirements of the scene;
a scene configuration module: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;
a data generation module: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
a model training module: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
Preferably, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate;
the scene configuration module comprises:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: providing a labeled object by matching with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output;
the data configuration needs to complete data generation configuration of three dimensions:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
Preferably, the data generation module includes:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured;
the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blurring, and smoothing effects;
the model training module:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, through the pre-trained model and the generated data of the current scene, the high-quality model under the scene can be rapidly obtained, and meanwhile, based on the pre-trained model, the training time is greatly saved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a schematic flow diagram of a scene adaptive learning system based on a pre-training model according to the present invention.
Fig. 2 is a schematic view of a scene configuration provided by the present invention.
Fig. 3 is a schematic diagram of the generated data provided by the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a scene adaptation learning method based on a pre-training model, which is characterized by comprising the following steps:
scene access step: accessing actual scene data according to actual business requirements of the scene;
scene configuration: template configuration of scene data is rapidly carried out through the accessed actual scene data;
a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
Specifically, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate.
Specifically, the scene configuration step includes:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: and providing a labeled object in cooperation with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output.
Specifically, the data configuration needs to complete data generation configuration in three dimensions:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
Specifically, the data generating step includes:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: and carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured.
Specifically, the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blur, and smoothing effects.
Specifically, the model training step:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
The scene adaptive learning system based on the pre-training model can be realized through the step flow of the scene adaptive learning method based on the pre-training model. The person skilled in the art may understand the method for learning scene adaptation based on a pre-trained model as a preferred example of the system for learning scene adaptation based on a pre-trained model.
The invention provides a scene adaptive learning system based on a pre-training model, which comprises:
a scene access module: accessing actual scene data according to actual business requirements of the scene;
a scene configuration module: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;
a data generation module: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
a model training module: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
Specifically, the actual scene data refers to a sample of real recognition scene data, including: license plate, picture and license plate;
the scene configuration module comprises:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: providing a labeled object by matching with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output;
the data configuration needs to complete data generation configuration of three dimensions:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
Specifically, the data generation module includes:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured;
the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blurring, and smoothing effects;
the model training module:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
The present invention will be described more specifically below with reference to preferred examples.
Preferred example 1:
as shown in fig. 1, the technology includes four main modules, namely, scene access, scene configuration, data generation, and model training.
1. Scene access
According to the actual business requirements of a scene, 10-20 pieces of actual data of the scene are accessed, such as a certificate, a picture, a license plate and other real identification scene data samples.
2. Scene configuration
The user can rapidly perform template configuration of the scene data through the accessed scene data. Scene configuration requires two aspects of work: area selection and data configuration.
The main functions of the area selection are as follows: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded. As shown in fig. 2 below.
The data configuration is mainly matched with the next new training data to provide a labeled object, the basic attributes of data required to be generated in the selected object are defined, the basic attributes comprise data content, data styles, data effects and the like, and meanwhile, the relevant formats and contents of standard answers when the overall effect of the template is exerted and the definition output is supported can be defined. Three dimensions of data generation configuration need to be completed.
Item: the item is the minimum unit of information processing, and one item can contain one or more annotation layers.
Labeling the layer: directly showing as the frame drawn by the picture of the picture area.
Layer information: the layer information comprises information definitions such as item types and auxiliary marks. And further defining the layer attribute information according to the layer type.
Through the scene configuration, the system can know which kind of data needs to be generated in which area. Such as: an 18-digit numeric character string is generated in the number area of the ID card, a character string with national address information is generated in the address area, and a capital character string of the amount is generated in the capital amount area of the invoice.
After the template configuration preparation is completed, data generation can be carried out, and data enhancement and training data preparation with labeled information are completed.
3. Generating data
From a given template, we next generate data.
Data generation includes two parts: material generation and special effect generation.
The material is mainly to fill and write characters, numbers, pictures and other information with proper formats in a configured area. The special effect generation is to perform special effect processing aiming at the problems of picture deformation, folding, illumination, shooting blurring and the like.
Material generation: the part is to generate characters, numbers, pictures and the like in a frame configured by a previous template. The purpose of material generation is to construct multi-scene scenes such as characters, numbers, pictures and the like which need to be identified and analyzed.
The rules of the data set filled in the frame selection content can be defined, and the data set comprises text, numbers, pictures and other types of data for selection. And advanced usages such as compound enumeration are provided to support the user to carry out combined connection on the data.
The materials of the data generation system support a font library, a corpus and a picture library, and the generated data format is enriched and is close to the data of a real scene.
Font library: and a large number of basic fonts are provided for users to adjust the fonts of the template contents according to actual data styles, and a real-time preview function is provided for the users to select the proper fonts conveniently.
Corpus: a large amount of text basic corpora are provided for a user to fill the template content according to actual needs, and meanwhile, the user-defined uploading of the required corpora is supported, so that more use scenes are flexibly met.
A picture library: a large amount of text basic picture materials are provided for a user to fill the template content according to actual needs, and meanwhile, the user-defined uploading of the required pictures is supported, so that more use scenes are flexibly met.
Through the configuration of characters, linguistic data and pictures and the basic processing of the images, a large number of pictures related to the current scene can be produced, and meanwhile, the marking information corresponding to the pictures is provided. A schematic diagram of the generated data is shown in fig. 3 below.
Generating special effects: the special effect generation is to perform special effect processing on a picture. The special effects comprise effects of illumination, 3d transformation, relief, sharpening, median blurring, smoothing and the like. Due to the special effect, the generated data can support various real scene fitting conditions as far as possible, and the diversity of model training data samples is ensured.
4. Model training system
And the model training function is used for continuing training through an OCR algorithm and an image recognition algorithm which are prefabricated in the platform.
Pre-training model
A large number of images on the platform are matured into an image processing algorithm and are prefabricated in a re-system. The user trains again based on the high-quality maturity module, greatly shortening the algorithm landing time. Such as a preset YOLO image classification/segmentation model, a text detection model, a text recognition model, etc.
Model training system
Through the pre-trained model, the model can be further trained by combining the generated data in the scene, and the accuracy of the pre-trained model is improved.
In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A scene adaptation learning method based on a pre-training model is characterized by comprising the following steps:
scene access step: accessing actual scene data according to actual business requirements of the scene;
scene configuration: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;
a data generation step: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
model training: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
2. The method of claim 1, wherein the actual scene data refers to a sample of real recognition scene data, and comprises: license plate, picture and license plate.
3. The method according to claim 1, wherein the scene configuration step comprises:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: and providing a labeled object in cooperation with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output.
4. The pre-training model-based scene adaptation learning method according to claim 3, wherein the data configuration requires three-dimensional data generation configuration:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
5. The method according to claim 1, wherein the data generating step comprises:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: and carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured.
6. The method of claim 5, wherein the material generation supports a font library, a corpus and a picture library to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blur, and smoothing effects.
7. The scene adaptive learning method based on the pre-training model as claimed in claim 1, wherein the model training step comprises:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
8. A scene adaptation learning system based on a pre-training model is characterized by comprising:
a scene access module: accessing actual scene data according to actual business requirements of the scene;
a scene configuration module: template configuration of actual scene data is rapidly carried out through the accessed actual scene data;
a data generation module: after the template configuration preparation is completed, data generation is carried out, and data enhancement and training data preparation with labeled information are completed;
a model training module: and further training the pre-training model according to the generated training data, so that the accuracy of the pre-training model is improved.
9. The pre-trained model based scene adaptation learning system according to claim 8, wherein the actual scene data refers to a sample of real recognition scene data, and comprises: license plate, picture and license plate;
the scene configuration module comprises:
and (3) area selection: according to the picture of the scene, the area needing to be identified is selected in a circle mode, a rectangular space is provided for generating new data in the area, and meanwhile the coordinates of the position space are recorded.
Data configuration: providing a labeled object by matching with the next new training data, defining basic data attributes required to be generated in the selected object, including data content, data styles and data effects, and simultaneously defining the overall effect of the template and supporting the definition of the relevant format and content of the standard answer during output;
the data configuration needs to complete data generation configuration of three dimensions:
item: item is the minimum unit of information processing, and one item can contain one or more label layers;
labeling the layer: the label layer is represented as a frame drawn by the picture area picture;
layer information: the layer information comprises item types and auxiliary mark information definitions, and the layer attribute information is further defined according to the layer types.
10. The pre-trained model based scene adaptive learning system of claim 1, wherein the data generation module comprises:
material generation: generating characters, numbers and pictures in a frame configured by the template, and constructing the characters, numbers and pictures of multiple scenes which need to be identified and analyzed;
generating special effects: carrying out special effect processing on the picture, so that the generated data supports various real scene fitting conditions, and the diversity of model training data samples is ensured;
the material generation supports a font library, a corpus and a picture library so as to enrich the generated data format;
font library: providing basic fonts for users to adjust the fonts of the template contents according to actual data styles, and simultaneously providing a real-time preview function to facilitate the users to select proper fonts;
corpus: providing text basic corpora for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the needed corpora in a self-defined manner, so that more use scenes are flexibly met;
a picture library: providing text basic picture materials for a user to fill the template content according to actual needs, and simultaneously supporting the user to upload the required pictures in a customized manner, so that more use scenes are flexibly met;
the special effects include: illumination, 3d transformation, relief, sharpening, median blurring, and smoothing effects;
the model training module:
training through a prefabricated OCR algorithm and an image recognition algorithm;
the pre-training model comprises: a preset YOLO image classification or segmentation model, a text detection model, and a text recognition model.
CN202010621371.7A 2020-07-01 2020-07-01 Scene adaptation learning method and system based on pre-training model Pending CN111783881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010621371.7A CN111783881A (en) 2020-07-01 2020-07-01 Scene adaptation learning method and system based on pre-training model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010621371.7A CN111783881A (en) 2020-07-01 2020-07-01 Scene adaptation learning method and system based on pre-training model

Publications (1)

Publication Number Publication Date
CN111783881A true CN111783881A (en) 2020-10-16

Family

ID=72760993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010621371.7A Pending CN111783881A (en) 2020-07-01 2020-07-01 Scene adaptation learning method and system based on pre-training model

Country Status (1)

Country Link
CN (1) CN111783881A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541363A (en) * 2020-11-24 2021-03-23 支付宝(杭州)信息技术有限公司 Method and device for recognizing text data of target language and server
CN113240088A (en) * 2021-05-17 2021-08-10 上海中通吉网络技术有限公司 Training method of text intention recognition model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541363A (en) * 2020-11-24 2021-03-23 支付宝(杭州)信息技术有限公司 Method and device for recognizing text data of target language and server
CN113240088A (en) * 2021-05-17 2021-08-10 上海中通吉网络技术有限公司 Training method of text intention recognition model

Similar Documents

Publication Publication Date Title
US10896357B1 (en) Automatic key/value pair extraction from document images using deep learning
CN111738251A (en) Optical character recognition method and device fused with language model and electronic equipment
CN109492635A (en) Obtain method, apparatus, equipment and the storage medium of labeled data
CN109299663A (en) Hand-written script recognition methods, system and terminal device
CN110162599A (en) Personnel recruitment and interview method, apparatus and computer readable storage medium
CN109446873A (en) Hand-written script recognition methods, system and terminal device
US20230027412A1 (en) Method and apparatus for recognizing subtitle region, device, and storage medium
US11580762B2 (en) Training a card type classifier with simulated card images
CN110210470A (en) Merchandise news image identification system
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN111783881A (en) Scene adaptation learning method and system based on pre-training model
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN111695518A (en) Method and device for labeling structured document information and electronic equipment
CN110852204A (en) Intelligent remote calligraphy digital learning evaluation information processing system and method
CN114119949A (en) Method and system for generating enhanced text synthetic image
CN111582344A (en) Method for identifying state of oil discharge port cover of gas station
CN114821620A (en) Text content extraction and identification method based on longitudinal combination of line text boxes
CN112800972A (en) Character recognition method and device, and storage medium
CN115130437B (en) Intelligent document filling method and device and storage medium
KR102178848B1 (en) Operating methed in electronic device for kanji study using agumented reality
CN111986015B (en) Method and system for extracting financial information for billing
CN111008295A (en) Page retrieval method and device, electronic equipment and storage medium
CN113705157B (en) Photographing and modifying method for paper work
CN113111869B (en) Method and system for extracting text picture and description thereof
Wang et al. Textformer: Component-aware text segmentation with transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination