CN111985583B

CN111985583B - Deep learning sample labeling method based on learning data

Info

Publication number: CN111985583B
Application number: CN202011035409.9A
Authority: CN
Inventors: 崔炜
Original assignee: Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Current assignee: Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2021-04-30
Anticipated expiration: 2040-09-27
Also published as: CN111985583A

Abstract

The invention discloses a deep learning sample labeling method based on learning data, which comprises the following steps: acquiring marking information of a historical learning object, and acquiring the average marking rate of each chapter corresponding to the historical learning object; calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section; according to the mark difference rate corresponding to each section in the learning course, which is obtained through calculation, the marking operation of the deep learning sample is executed; the processing mode keeps the number of the marks of each chapter in the whole text at an average level, reduces the learning load of the learning object, improves the learning efficiency of the learning object, and achieves the purpose of carrying out deep learning sample marking on the learning data.

Description

Deep learning sample labeling method based on learning data

Technical Field

The invention relates to the technical field of data processing, in particular to a deep learning sample labeling method based on learning data.

Background

In the existing learning sample labeling, two labeling modes are mainly adopted, one is as follows: manual marking is carried out manually; the manual labeling method is inefficient and requires a lot of time and effort. The other is an automatic labeling mode; at present, in an automatic labeling mode in the prior art, generally, a manually set keyword/word or the like is used as a reference for labeling, the labeling accuracy is low, and when a large difference exists between the labeling data amounts corresponding to the labeling information in adjacent pages, automatic correction processing cannot be performed, and manual intervention is also needed. Therefore, the existing learning sample labeling method cannot meet the actual requirements of learning data labeling.

Disclosure of Invention

The invention provides a deep learning sample labeling method based on learning data, aiming at keeping the number of labels of each chapter of a whole text at an average level and reducing the learning load of a learning object.

The invention provides a deep learning sample labeling method based on learning data, which comprises the following steps:

acquiring marking information of a historical learning object, and acquiring the average marking rate of each chapter corresponding to the historical learning object;

calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section;

and executing the labeling operation of the deep learning sample according to the calculated label difference rate corresponding to each section in the learning course.

Further, the acquiring labeling information of the historical learning object and obtaining the average labeling rate of each section corresponding to the historical learning object includes:

according to the deep learning sample labeling demand information, recognizing the learned course content, and collecting the labeling information of a historical learning object;

and calculating the average labeling rate of each section corresponding to the historical learning object based on the collected labeling information of the historical learning object.

Further, the calculating, based on the collected labeling information of the historical learning object, an average labeling rate of each section corresponding to the historical learning object includes:

calculating the average Mark rate Mark (k) of each section corresponding to the historical learning object by using a formula (1) according to the number of the learning pages corresponding to each section of the learning course in the collected labeling information of the historical learning object, the total number of bytes and the total number of marks corresponding to each page in the learning pages of each section of the learning course_m,k_m-1) Then, there are:

wherein M is the number of the learning pages corresponding to each chapter of the extracted learning course, and the value range is [2, M ]]，a_mIs the total number of bytes, k, of the mth page in each section of the learning course_mTotal number of marks, k, for the mth page in each chapter of the course_m-1Total number of marks, sum (k), for page m-1 in each chapter of the course_m,k_m-1) Summing the total number of the marks on the mth page of each section of the learning course and the total number of the marks on the (m-1) th page to obtain a sum; mark (k)_m,k_m-1) And obtaining the average marking rate of each chapter.

Further, the calculating a mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section includes:

calculating and obtaining a mark difference rate Dif (k) between adjacent learning pages corresponding to each chapter in the learning course by using a formula (2) according to the obtained average marking rate of each chapter_m,k_m-1) Then, there are:

wherein eta is a difference parameter and takes a value of 0.2; mark (k)_m,k_m-1) For the average mark rate of each chapter, Dif (k)_m,k_m-1) The mark difference rate between the total number of marks of the mth page and the total number of marks of the m-1 page of each section in the learning course.

Further, the performing, according to the calculated mark difference rate corresponding to each chapter in the learning course, a labeling operation of a deep learning sample includes:

comparing the marking difference rate with a preset difference rate according to the calculated marking difference rate corresponding to each chapter in the learning course;

and executing the labeling operation of the deep learning sample according to the comparison result of the mark difference rate and the preset difference rate.

Further, the performing, according to the comparison result between the mark difference rate and a preset difference rate, an annotation operation of a deep learning sample, including:

if the mark difference rate is larger than the preset difference rate, executing corresponding marking operation on the deep learning sample based on the mark difference rate and according to the obtained average marking rate of each section corresponding to the historical learning object;

and if the mark difference rate is smaller than or equal to the preset difference rate, not performing the marking operation of the deep learning sample.

Further, if the mark difference rate is greater than the preset difference rate, performing corresponding labeling operation on the deep learning sample according to the obtained average labeling rate of each chapter corresponding to the historical learning object based on the mark difference rate, including:

when the mark difference rate between the adjacent learning pages corresponding to each section in the learning course is greater than the preset difference rate, calculating the number Q of the marks to be adjusted corresponding to the learning pages in each section by using a formula (3) according to the mark difference rate and the obtained average mark rate of each section, and then:

wherein, theta is a correction factor and takes a value of 1.5; q is the number of the marks to be adjusted corresponding to each learning page in each chapter; mark (k)_m,k_m-1) The average marking rate of each section corresponding to the historical learning object is obtained; dif (k)_m,k_m-1) Marking the difference rate between the adjacent learning pages corresponding to each chapter in the learning course; m is the learning page of the corresponding mth page in each chapter and the value range is [2, M]；

And executing the marking operation of the deep learning sample according to the number Q of the marks to be adjusted corresponding to each learning page in each section obtained through calculation.

Further, the executing, according to the number Q of the to-be-adjusted marks corresponding to each learning page in each section obtained by calculation, the labeling operation of the deep learning sample includes:

if the number Q of the marks to be adjusted corresponding to the learning page in each section obtained by calculation is 0, the marking operation of the deep learning sample is not carried out;

and if the number Q of the marks to be adjusted corresponding to the learning page in each section obtained by calculation is not 0, executing the marking operation of the deep learning sample according to the number Q of the marks to be adjusted obtained by calculation.

Further, the performing, according to the calculated number Q of the to-be-adjusted marks, a labeling operation of a deep learning sample, including:

and supplementing or deleting the labels corresponding to the number Q of the marks to be adjusted according to the number Q of the marks to be adjusted obtained through calculation and by referring to the size relationship between the mark difference rate and the average labeling rate of each chapter, so as to execute the labeling operation corresponding to the deep learning sample.

Further, the supplementing or deleting the labels corresponding to the number Q of the to-be-adjusted labels according to the calculated number Q of the to-be-adjusted labels and by referring to the magnitude relationship between the label difference rate and the average labeling rate of each chapter, includes:

if the mark difference rate is larger than the average mark rate of each chapter, deleting marks with corresponding quantity from the corresponding learning pages in each chapter; the number of the deleted labels on the learning page is equal to the number Q of the labels to be adjusted;

if the mark difference rate is smaller than the average mark rate of each section, supplementing corresponding learning pages in each section with marks in corresponding quantity; and the number of the labels supplemented to the learning page is equal to the number Q of the labels to be adjusted.

The deep learning sample labeling method based on the learning data acquires the average labeling rate of each chapter corresponding to a historical learning object by acquiring the labeling information of the historical learning object; calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section; according to the mark difference rate corresponding to each section in the learning course, which is obtained through calculation, the marking operation of the deep learning sample is executed; the number of marks of each chapter in the whole text is kept at an average level, the learning load of a learning object is reduced, the learning efficiency of the learning object is improved, and the purpose of deep learning sample labeling on learning data is achieved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described below by means of the accompanying drawings and examples.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a schematic flow chart of an embodiment of a deep learning sample labeling method based on learning data according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The invention provides a learning data-based deep learning sample labeling method, which is characterized in that the labeling information of a historical learning object is summarized and the average labeling rate of each chapter and the mark difference rate corresponding to a learning page in each chapter are obtained, so that the number of marks of each chapter in the whole text is kept at an average level, the learning load of a learning object is reduced, the learning efficiency of the learning object is improved, and the purpose of deep learning sample labeling on the learning data is realized.

As shown in fig. 1, fig. 1 is a schematic flowchart of an embodiment of a deep learning sample labeling method based on learning data according to the present invention; the deep learning sample labeling method based on the learning data can be implemented as the following steps S10-S30:

and step S10, collecting the labeling information of the historical learning object, and acquiring the average labeling rate of each chapter corresponding to the historical learning object.

And step S20, calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the acquired average mark rate of each section.

And step S30, executing the labeling operation of the deep learning sample according to the calculated mark difference rate corresponding to each chapter in the learning course.

In an embodiment, in the embodiment shown in fig. 1, "step S10, the acquiring labeling information of the history learning object, and obtaining an average labeling rate of each chapter corresponding to the history learning object" may be implemented according to the following technical means:

according to the deep learning sample labeling demand information, recognizing the learned course content, and collecting the labeling information of a historical learning object; and calculating the average labeling rate of each section corresponding to the historical learning object based on the collected labeling information of the historical learning object.

In the embodiment of the invention, the average marking rate of each chapter corresponding to the historical learning object is obtained by summarizing and summarizing the marking information of the historical learning object.

In an embodiment, the calculating, based on the collected labeling information of the historical learning object, an average labeling rate of each section corresponding to the historical learning object may be implemented according to the following technical means:

In an embodiment, the step S20 in the embodiment shown in fig. 1, calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average labeling rate of each section, may be implemented according to the following technical means:

In an embodiment, the performing of the labeling operation of the deep learning samples according to the calculated mark difference rate corresponding to each chapter in the learning course may be implemented according to the following technical means:

comparing the marking difference rate with a preset difference rate according to the calculated marking difference rate corresponding to each chapter in the learning course; and executing the labeling operation of the deep learning sample according to the comparison result of the mark difference rate and the preset difference rate.

Further, in an embodiment, the performing of the labeling operation of the deep learning samples according to the comparison result between the mark difference rate and the preset difference rate may be implemented according to the following technical means:

Further, in an embodiment, if the mark difference rate is greater than the preset difference rate, the deep learning sample is subjected to a corresponding labeling operation according to the obtained average labeling rate of each section corresponding to the historical learning object based on the mark difference rate, which may be implemented according to the following technical means:

wherein, theta is a correction factor and takes a value of 1.5; q is the number of the marks to be adjusted corresponding to each learning page in each chapter; mark (k)_m,k_m-1) The average marking rate of each section corresponding to the historical learning object is obtained; dif (k)_m,k_m-1) Marking the difference rate between the adjacent learning pages corresponding to each chapter in the learning course; m is the learning page of the corresponding mth page in each chapter and the value range is [2, M](ii) a And executing the marking operation of the deep learning sample according to the number Q of the marks to be adjusted corresponding to each learning page in each section obtained through calculation.

In addition, in the embodiment of the present invention, in order to ensure that the number of flags that need to be adjusted is ignored as being not greater than 1, the value of the correction factor θ is set to 1.5.

Further, in an embodiment, the marking operation of the deep learning sample is executed according to the number Q of the to-be-adjusted marks corresponding to each learning page in each section obtained by calculation, and may be implemented according to the following technical means:

Further, in an embodiment, the performing of the labeling operation of the deep learning sample according to the calculated number Q of the marks to be adjusted may be implemented according to the following technical means:

Further, in an embodiment, the labels corresponding to the number Q of the marks to be adjusted are supplemented or deleted by referring to a size relationship between the mark difference rate and the average labeling rate of each chapter according to the calculated number Q of the marks to be adjusted, and further, in an embodiment:

if the mark difference rate is larger than the average mark rate of each chapter, deleting marks with corresponding quantity from the corresponding learning pages in each chapter; and deleting the learning page, wherein the number of labels deleted from the learning page is equal to the number Q of the labels to be adjusted.

The deep learning sample labeling method based on the learning data acquires the average labeling rate of each chapter corresponding to a historical learning object by acquiring the labeling information of the historical learning object; calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section; and executing the marking operation of the deep learning sample according to the mark difference rate corresponding to each section in the learning course, so that the mark number of each section in the whole text is kept at an average level, the learning load of a learning object is reduced, the learning efficiency of the learning object is improved, and the purpose of marking the deep learning sample on the learning data is realized.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A deep learning sample labeling method based on learning data is characterized by comprising the following steps:

according to the mark difference rate corresponding to each section in the learning course, which is obtained through calculation, the marking operation of the deep learning sample is executed;

the acquiring of the labeling information of the historical learning object and the obtaining of the average labeling rate of each section corresponding to the historical learning object include:

calculating the average labeling rate of each section corresponding to the historical learning object based on the collected labeling information of the historical learning object, and specifically comprising the following steps:

wherein M is the number of the learning pages corresponding to each chapter of the extracted learning course, and the value range is [2, M ]]M is each section of the extracted learning courseThe total number of corresponding learning pages; a is_mIs the total number of bytes, k, of the mth page in each section of the learning course_mTotal number of marks, k, for the mth page in each chapter of the course_m-1Total number of marks, sum (k), for page m-1 in each chapter of the course_m,k_m-1) Summing the total number of the marks on the mth page of each section of the learning course and the total number of the marks on the (m-1) th page to obtain a sum; mark (k)_m,k_m-1) The average marking rate of each acquired chapter is obtained;

calculating the mark difference rate corresponding to each section in the learning course corresponding to the historical learning object according to the obtained average mark rate of each section, wherein the calculation comprises the following steps:

wherein eta is a difference parameter and takes a value of 0.2; mark (k)_m,k_m-1) For the average mark rate of each chapter, Dif (k)_m,k_m-1) The mark difference rate between the total number of marks on the mth page and the total number of marks on the m-1 page of each chapter in the learning course is obtained;

the method comprises the following steps of performing labeling operation of deep learning samples according to the mark difference rate corresponding to each chapter in the learning course, wherein the labeling operation comprises the following steps:

according to the comparison result of the mark difference rate and the preset difference rate, executing the labeling operation of the deep learning sample, and specifically comprising the following steps:

if the mark difference rate is smaller than or equal to the preset difference rate, the marking operation of the deep learning sample is not carried out;

if the mark difference rate is greater than the preset difference rate, performing corresponding labeling operation on the deep learning sample according to the obtained average labeling rate of each chapter corresponding to the historical learning object based on the mark difference rate, wherein the method comprises the following steps:

2. The learning-data-based deep learning sample labeling method according to claim 1, wherein the performing the labeling operation of the deep learning sample according to the number Q of the labels to be adjusted corresponding to each learning page in each chapter includes:

3. The learning-data-based deep learning sample labeling method according to claim 2, wherein the performing deep learning sample labeling operation according to the calculated number Q of the marks to be adjusted comprises:

4. The learning-data-based deep learning sample labeling method according to claim 3, wherein the supplementing or deleting labels corresponding to the number of to-be-adjusted labels Q according to the calculated number of to-be-adjusted labels Q by referring to a magnitude relation between the label difference rate and the average labeling rate of each chapter comprises: