CN113449142A

CN113449142A - Information processing method and device, electronic equipment, storage medium and product

Info

Publication number: CN113449142A
Application number: CN202110736131.6A
Authority: CN
Inventors: 杨雪
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-09-28

Abstract

The disclosure provides an information processing method and device, electronic equipment, a storage medium and a product, and relates to the technical field of computers, in particular to the field of data labeling in the field of artificial intelligence. The specific implementation scheme is as follows: acquiring a plurality of marking demand information; determining a first incidence relation among the plurality of marking requirement information; determining a distribution mode of a plurality of labeling tasks corresponding to the plurality of labeling requirement information according to the first incidence relation; and distributing the plurality of annotation tasks according to the distribution mode. The method and the device improve the labeling efficiency of the labeling task.

Description

Information processing method and device, electronic equipment, storage medium and product

Technical Field

The present disclosure relates to the field of computer technology, and more particularly, to the field of data labeling in the field of artificial intelligence.

Background

With the popularization of artificial intelligence, a large amount of training data is required in the process of training models such as a speech recognition model and a text recognition model. Most of the training data is obtained by manually labeling voice or text. For example, in the voice labeling, the voice text content is obtained, and the corresponding label is manually labeled to the correct or wrong voice text content.

In the existing labeling industry, when the labeling requirement information is determined, a professional trainer is needed to train a labeling person, and then the labeling person executes labeling according to an assigned labeling task. When the labeling personnel acquire the labeling task, all labeling requirements need to be known, complicated labeling rules are memorized, and then labeling can be completed, so that the labeling difficulty is increased, and the labeling efficiency is low.

Disclosure of Invention

The present disclosure provides an information processing method and apparatus, an electronic device, a storage medium, and a product for improving labeling efficiency.

According to an aspect of the present disclosure, there is provided an information processing method including:

acquiring a plurality of marking demand information;

determining a first incidence relation among the plurality of marking requirement information;

determining a distribution mode of a plurality of labeling tasks corresponding to the plurality of labeling requirement information according to the first incidence relation;

and distributing the plurality of annotation tasks according to the distribution mode.

According to another aspect of the present disclosure, there is provided an information processing apparatus including:

the information acquisition module is used for acquiring a plurality of marking requirement information;

the first determining module is used for determining a first incidence relation among the plurality of marking demand information;

a second determining module, configured to determine, according to the first association relationship, a distribution manner of a plurality of annotation tasks corresponding to the plurality of annotation demand information;

and the task distribution module is used for distributing the plurality of labeling tasks according to the distribution mode.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technical scheme, the distribution mode of the plurality of labeling tasks is determined according to the first incidence relation among the plurality of labeling requirement information, and then the plurality of labeling tasks are distributed according to the distribution mode, so that the automatic distribution of the labeling tasks can be realized, and labeling personnel do not need to know all labeling requirements and only need to know the received labeling requirements related to the labeling tasks, thereby reducing the labeling difficulty and improving the labeling efficiency.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic diagram of an information processing method according to a first embodiment of the present disclosure;

fig. 2 is a schematic diagram of an information processing method according to a second embodiment of the present disclosure;

fig. 3 is a schematic diagram of an information processing method according to a third embodiment of the present disclosure;

fig. 4 is a schematic diagram of an information processing method according to a fourth embodiment of the present disclosure;

fig. 5 is a schematic diagram of an information processing apparatus according to a fifth embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing a method of information processing of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the disclosure can be applied to a data annotation scene, the annotation task is divided into a plurality of tasks by utilizing the dependency relationship among the annotation requirements to be distributed to different users, the task of knowing the annotation requirements by the annotation users is reduced by batch annotation, and the annotation efficiency is improved.

In the existing annotation task, a plurality of annotation requirement information is generally set for any one annotation task. In the process of labeling the object to be labeled, all the labeling requirement information is generally required to be allocated to the user. After acquiring all the labeling requirement information, the user needs to browse and memorize all the labeling requirement information so as to label the object to be labeled. However, this way of labeling requires the user to understand all the information required for labeling in order to label. In the process of labeling, if the labeling requirement information is not comprehensively known, the labeling requirement information needs to be continuously looked over, so that the labeling efficiency is reduced.

In order to solve the technical problem, in the embodiment of the present disclosure, a first association relationship between a plurality of pieces of annotation demand information is obtained, so as to confirm the association relationship between the pieces of annotation demand information, and confirm a distribution manner of a plurality of annotation tasks corresponding to the pieces of annotation demand information by using the first association relationship between the pieces of annotation demand information, so as to distribute the plurality of annotation tasks according to the distribution manner, thereby realizing automatic distribution processing of the annotation tasks, so that an annotation user only needs to pay attention to the annotation demand information corresponding to the distributed annotation tasks, and does not need to pay attention to all pieces of annotation demand information, thereby quickly completing the annotation tasks, and effectively improving the annotation efficiency.

Technical solutions of embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, which is a schematic diagram of an information processing method according to a first embodiment of the disclosure, the method may include the following steps:

s101: and determining a plurality of marking requirement information.

Alternatively, a plurality of annotation requirement information can be provided by a user having an object annotation requirement.

Wherein, determining the plurality of annotation requirement information may include: and receiving a plurality of marking requirement information provided by a requirement user. A requirements input interface may be provided for a requirements user to enter a plurality of annotated requirements information in the requirements input interface.

The annotation requirement information may be content information formed by the annotation requirements for a plurality of annotation objects, and may exist in the form of text description, voice or video, and the specific expression form of the annotation requirement information in the present disclosure is not limited too much. Under the condition that one piece of labeling requirement information is determined, a plurality of labeling objects corresponding to the labeling requirement information can be determined. The plurality of annotation objects can be provided by the user, that is, the user can define a plurality of annotation requirement information and an annotation object corresponding to each annotation requirement information, and the plurality of annotation objects can also be generated in real time by the annotation result of the previous annotation task.

S102: and determining a first association relation among the plurality of annotation requirement information.

The first association relationship may be a dependency relationship between the plurality of annotation requirement information. If the respective labeling tasks of the two pieces of labeling demand information have dependency relationship during execution, the fact that the two pieces of labeling demand information have incidence relationship can be determined, and the fact that the two pieces of labeling demand information have no dependency relationship during the respective execution process of the two pieces of labeling demand information can be determined. In a possible implementation manner, the association relationship between any two pieces of labeling requirement information may be recorded as a dependency relationship, and the dependency relationship corresponding to all any two pieces of labeling requirement information in the multiple pieces of labeling requirement information may be obtained as the first association relationship, that is, the first association relationship may be formed by all the first dependency relationships corresponding to the multiple pieces of labeling requirement information. In the case that the plurality of first dependencies in all the first dependencies are sufficient to clarify the association between the plurality of annotation requirement information, the first association information may be composed of the plurality of first dependencies.

S103: and determining a distribution mode of a plurality of labeling tasks corresponding to the plurality of labeling requirement information according to the first incidence relation.

And the at least one piece of first requirement information to be marked currently is in a parallel relation. And under the condition that at least one piece of first requirement information is more than two pieces of first requirement information, the first association relation between any two pieces of marking requirement information is a parallel relation.

The at least one first requirement information to be labeled currently has a serial relation with the at least one first requirement information labeled last time before. The first association relationship between any one of the currently to-be-labeled marking requirement information and any one of the previously labeled marking requirement information is a serial relationship.

The at least one first requirement information to be labeled at present and the at least one first requirement information labeled at the next time after the at least one first requirement information to be labeled at present have a series relation. The first association relationship between any one of the currently to-be-labeled marking requirement information and any one of the subsequently labeled marking requirement information is a serial relationship.

S104: and distributing the plurality of annotation tasks according to a distribution mode.

After step S104, the method further comprises: and acquiring the labeling results corresponding to the plurality of labeling tasks respectively.

Any one annotation task (or task) can correspond to a plurality of annotation objects and content information for describing the annotation requirements of the plurality of annotation objects in detail. The distributing of the plurality of annotation tasks may be distributing the annotation objects of the plurality of annotation tasks and the content information describing the annotation requirements of the distributed annotation objects in detail.

In the first embodiment of the disclosure, the first association relationship between the multiple labeling requirement information is acquired, the confirmation of the association relationship between the labeling requirement information is realized, the first association relationship between the multiple labeling requirement information is utilized, the confirmation of the distribution mode is performed on the multiple labeling tasks corresponding to the multiple labeling requirement information, the multiple labeling tasks are distributed according to the distribution mode, the distribution processing of the labeling tasks is realized, the labeling user only needs to pay attention to the labeling requirement information corresponding to the distributed labeling tasks, all the labeling requirement information does not need to be paid attention to, the labeling difficulty is reduced, the labeling tasks can be quickly completed, and the labeling efficiency is effectively improved.

In some embodiments, the plurality of annotation requirement information is at least one of a plurality of voice requirement information set for voice annotation requirements and a plurality of text requirement information set for text annotation requirements.

Illustratively, when the technical solution of the embodiment of the present disclosure is applied to a voice annotation requirement, the plurality of annotation requirement information may be a plurality of voice requirement information. The voice marking requirement can comprise two applications of transcription and segmentation transcription, and the plurality of voice requirement information can comprise: timestamp demand information, transcription content demand information, and/or attribute demand information.

Wherein, the timestamp requirement information may be: in the case of segmenting transcription, a long-time speech is segmented into a plurality of short voices, and a start time point and a stop time point in the original speech of each short voice are recorded.

The transfer content requirement information may be: and marking the voice content of each short voice, namely the text content of the voice, wherein each short voice corresponds to the starting time point and the ending time point of the short voice in the original voice.

The attribute requirement information may be: and performing attribute labeling on each short voice and the text obtained by transcription, for example, labeling the speaker role of each short voice, whether the voice is clear, whether overlapped sound or noise exists, whether the voice belongs to system sound, and the like.

Illustratively, when the technical solution of the embodiment of the present disclosure is applied to a text annotation requirement, the plurality of annotation requirement information may be a plurality of text requirement information. The text annotation requirements can include two applications of text content review and text enrichment. For a text content review scenario, the plurality of text requirement information may include: the method comprises the following steps of text quality demand information, demand information related to sensitive content, demand information related to emotion types and/or demand information related to semantic judgment and the like. For a text enrichment review scenario, the plurality of text requirement information may include: the system comprises human-computer conversation checking demand information, character object checking demand information, text translation demand information, question enrichment demand information and/or answer enrichment demand information and the like.

In the embodiment of the disclosure, the specific requirement information for the voice label and/or the text label is explained, so that the task division can be performed on the voice label and/or the text label according to the information processing method provided by the disclosure, and a labeling person only needs to know the received labeling requirements related to the labeling task without knowing all labeling requirements, thereby reducing the labeling difficulty and effectively improving the labeling efficiency of the voice label and/or the text label.

As shown in fig. 2, which is a schematic diagram of an information processing method according to a second embodiment of the disclosure, the method may include the following steps:

s201: and determining a plurality of marking requirement information.

It should be noted that, some steps in this embodiment are the same as or similar to some steps in the foregoing embodiment, and are not repeated herein for the sake of brevity of description.

S202: and determining a first association relation among the plurality of annotation requirement information.

S203: according to the first incidence relation, the annotation tasks corresponding to the two pieces of annotation demand information with incidence relation are determined as a serial distribution mode, and the annotation tasks corresponding to the two pieces of annotation demand information without incidence relation are determined as a parallel distribution mode.

When the two pieces of labeling requirement information have an incidence relation, a distribution sequence exists between the labeling tasks of the two pieces of labeling requirement information, and the distribution sequence is a serial distribution mode. When the two pieces of marking demand information have a parallel relation, the execution of the marking tasks of the two pieces of marking demand information are independent, no mutual influence exists, and the two pieces of marking demand information are distributed in a parallel mode.

S204: and distributing the plurality of annotation tasks according to a distribution mode.

In the second embodiment of the present disclosure, after determining the plurality of annotation demand information, a first association relationship between the plurality of annotation demand information may be determined. And then according to the first incidence relation, determining the labeling tasks corresponding to the two pieces of labeling requirement information with incidence relation as a serial distribution mode, and determining the labeling tasks corresponding to the two pieces of labeling requirement information without incidence relation as a parallel distribution mode. The distribution mode corresponding to the labeling requirement information is confirmed through the first incidence relation so as to obtain a parallel or serial distribution mode, the corresponding labeling tasks are distributed through the parallel or serial distribution mode, automatic distribution of the tasks is achieved, labeling personnel do not need to know all labeling requirements, and only need to know the received labeling requirements related to the labeling tasks, so that labeling difficulty is reduced, and distribution efficiency is improved.

In practical application, the division of the annotation task needs to depend on the association relationship between the annotation task and the annotation object, for example, in practical application, there may be a sequential annotation sequence between the annotation tasks of any two pieces of annotation demand information, and the annotation sequence of the annotation demand information can be confirmed, so as to distribute the annotation tasks according to the corresponding annotation sequence.

As shown in fig. 3, which is a schematic diagram of an information processing method according to a third embodiment of the present disclosure, the method may include the following steps:

s301: and determining a plurality of marking requirement information.

S302: and determining a first association relation among the plurality of annotation requirement information.

S303: according to the first incidence relation, the annotation tasks corresponding to the two pieces of annotation demand information with incidence relation are determined as a serial distribution mode, and the annotation tasks corresponding to the two pieces of annotation demand information without incidence relation are determined as a parallel distribution mode.

Under the condition that any two pieces of marking demand information are distributed in a serial mode, the sequence of marking tasks of the two pieces of marking demand information can be determined according to the serial distribution mode.

Under the condition that any two pieces of labeling requirement information are distributed in parallel, the labeling order of the labeling tasks of the two pieces of labeling requirement information can be determined to be the same.

S304: according to the serial distribution mode and the parallel distribution mode, a plurality of labeling tasks are divided into a plurality of task groups with a sequential execution sequence, and each task group comprises at least one parallel labeling task.

The labeling order may be represented by a number or a sequence symbol of a character, where the sequence symbol specifically indicates the significance of the labeling order, and may be set according to actual use requirements, for example, the number, the letter, the greek letter, and the like.

Any one task group corresponds to at least one annotation task. The dividing of the plurality of annotation tasks into a plurality of task groups having a sequential execution order may specifically include: and determining sequence labels respectively corresponding to the labeling sequences of the plurality of pieces of labeling requirement information. The two pieces of labeling requirement information in a serial distribution mode have different sequence labels, the labels have different sizes, and the two pieces of labeling requirement information in a parallel distribution mode have the same sequence label. For any sequence label, at least one piece of labeling requirement information can be corresponded, and then the labeling tasks of at least one piece of labeling requirement information with the same sequence label are divided into the same task group.

For convenience of understanding, it is assumed that there are three pieces of labeling requirement information, which are A, B, C, wherein a and B are distributed in a serial manner, a and C are distributed in a serial manner, and B and C are distributed in a parallel manner. At this time, it can be determined that B and C are labeled in the same order, and a and B, A are not labeled in the same order. Assuming that the annotation order of A is before B, C, 1 may be used as the annotation task of A corresponding to the order notation, and 2 may be used as the annotation task of B and the annotation task of C corresponding to the order notation. When the annotation task is distributed, the annotation sequence of A is 1, and the annotation sequence of B and C is 2. In the process of distributing the labeling tasks, the task distribution of the labeling task A is executed firstly, and then the labeling tasks B and C are executed, wherein the task distribution can be executed by the B and the C at the same time.

S305: and distributing a plurality of task groups according to the execution sequence, and distributing the tasks in each task group at the same time.

In the third embodiment of the present disclosure, after determining the plurality of annotation demand information, a first association relationship between the plurality of annotation demand information may be determined. And determining the annotation tasks corresponding to the two pieces of annotation demand information with the association relationship as a serial distribution mode and determining the annotation tasks corresponding to the two pieces of annotation demand information without the association relationship as a parallel distribution mode by utilizing the first association relationship. The method comprises the steps of dividing a plurality of labeling tasks into a plurality of task groups with a sequential execution sequence by utilizing a serial distribution mode and a parallel distribution mode, wherein each task group comprises at least one parallel task. The task groups are distributed in a serial distribution mode, a plurality of task groups are distributed according to the execution sequence, and at least one task in each task group is distributed simultaneously. By dividing the serial task groups and determining the tasks parallel to the task groups, the automatic and ordered distribution of the task groups is realized, the accuracy of task distribution is improved, and the marking personnel only need to know the received marking requirements related to the marking tasks without knowing all the marking requirements, so that the marking difficulty is reduced, the processing efficiency is ensured, and the processing precision is improved.

As shown in fig. 4, which is a schematic diagram of an information processing method according to a fourth embodiment of the disclosure, the method may include the following steps:

s401: and determining a plurality of marking requirement information.

S402: and determining a first association relation among the plurality of annotation requirement information.

S403: according to the first incidence relation, the annotation tasks corresponding to the two pieces of annotation demand information with incidence relation are determined as a serial distribution mode, and the annotation tasks corresponding to the two pieces of annotation demand information without incidence relation are determined as a parallel distribution mode.

S404: according to the serial distribution mode and the parallel distribution mode, a plurality of labeling tasks are divided into a plurality of task groups with a sequential execution sequence, and each task group comprises at least one parallel task.

S405: the method comprises the steps of distributing a plurality of task groups in sequence according to an execution sequence, labeling any one of tasks in each task group, determining a second incidence relation among a plurality of labeling objects corresponding to the labeling task, dividing the labeling task into at least one subtask according to the second incidence relation, determining a distribution mode of each subtask, and determining a distribution mode of each subtask.

S406: and simultaneously distributing the tasks in each task group, and distributing all the subtasks of each task according to a determined distribution mode.

In the fourth embodiment of the present disclosure, after the multiple pieces of annotation demand information are determined, a first association relationship between the multiple pieces of annotation demand information may be determined, and according to the first association relationship, an annotation task corresponding to two pieces of annotation demand information having an association relationship is determined as a serial distribution manner, and an annotation task corresponding to two pieces of annotation demand information not having an association relationship is determined as a parallel distribution manner. And dividing the plurality of labeling tasks into a plurality of task groups with a sequential labeling order according to a serial distribution mode and a parallel distribution mode. Each task group contains at least one task in parallel. And for the labeling task, confirming the execution sequence among the task groups when the task groups are grouped, distributing the task groups according to the execution sequence, simultaneously distributing the tasks in each task group, and distributing all the subtasks of each task according to the second association relationship. In the process of distributing each task, dividing any task to be distributed into at least one subtask, and determining the distribution mode of each subtask. On the basis of each labeling task, subtask division is performed to divide each labeling task into smaller subtasks, so that the labeling task amount and the labeling difficulty of each labeling worker are reduced, the labeling of each labeling task is quickly executed, and the labeling efficiency is further improved.

Optionally, the first association relationship may be a dependency relationship between the information of the annotation requirement. The second association relationship may be a dependency relationship between a plurality of annotation objects of the annotation demand information.

In some embodiments, the first association relationship and the second association relationship may be set by the requirement user while providing the annotation requirement information. The first association relationship and the second association relationship may also be obtained by automatic identification.

Wherein, determining the first association relationship among the plurality of annotation requirement information may include: and identifying the respective labeling content of any two pieces of labeling demand information in the plurality of pieces of labeling demand information, so as to determine the association relationship between the two pieces of labeling demand information according to the respective labeling content of the two pieces of labeling demand information. The first association relationship may include the presence of an association relationship or the absence of an association relationship.

For two pieces of labeling requirement information with an association relationship, it can be determined that the labeling tasks of the two pieces of labeling requirement information are in a serial distribution relationship. For two pieces of annotation demand information without an association relationship, a parallel distribution relationship between respective annotation tasks of the two pieces of annotation demand information can be determined.

The labeling content is the specific labeling meaning of the labeling requirement information. The association relationship between the two pieces of annotation demand information may be, for example, that when the annotation task of one piece of annotation demand information needs to use the annotation result corresponding to the annotation task of the other piece of annotation demand information during execution, the association relationship between the two pieces of annotation demand information exists. The two pieces of annotation demand information do not have an association relationship, for example, the two pieces of annotation demand information have no association and are independent of each other.

Optionally, when the association relationship between any two pieces of demand information in the multiple pieces of labeling demand information is identified, the association relationship between any one piece of labeling demand information and other pieces of labeling demand information can be identified, and the two pieces of labeling demand information with the identified association relationship are recorded, so that when the first association relationship is identified subsequently, the two pieces of labeling demand information are not repeatedly identified until all pieces of labeling demand information in the multiple pieces of labeling demand information complete the identification of the association relationship with other pieces of labeling demand information, and the first association relationship corresponding to the multiple pieces of labeling demand information is obtained.

Illustratively, taking the aforementioned voice requirement information as an example, there is an association relationship between the timestamp requirement information and the transcribed content requirement information, and the distribution manner of the timestamp marking task and the transcribed content marking task is a serial distribution manner. An incidence relation exists between the timestamp demand information and the attribute demand information, and the distribution mode of the timestamp marking task and the attribute marking task is a serial distribution mode. And no incidence relation exists between the transcribed content demand information and the attribute demand information, and the distribution mode of the transcribed content labeling task and the attribute labeling task is a parallel distribution mode.

Optionally, the second association relationship between the plurality of annotation objects corresponding to any one annotation task may be determined by: and identifying the annotation content of any annotation demand information, judging the association relationship between the annotation objects corresponding to the annotation task of the annotation demand information according to the annotation content, and determining a second association relationship between a plurality of annotation objects of the annotation task.

And if the annotation content of the annotation demand information indicates whether the multiple annotation objects of the annotation task of the annotation demand information are associated by annotation. When the annotation association relationship exists among the plurality of annotation objects, at least one subtask of the annotation task can be determined to be a serial distribution mode. Under the condition that the annotation association relationship exists among a plurality of annotation objects, at least one subtask of the annotation task can be determined to be a parallel distribution mode.

Illustratively, by taking the aforementioned voice demand information as an example, a plurality of annotation objects of the annotation task corresponding to the timestamp demand information have a chronological precedence relationship, so that at least one subtask corresponding to the annotation task of the timestamp is actually a serial distribution manner. Only when the start-stop time of the preceding speech segment is determined can the start-stop time of the following speech segment be confirmed. A plurality of annotation objects of an annotation task corresponding to the transcription content annotation do not have an incidence relation, and a corresponding subtask is a parallel distribution mode. The transcription contents of different speech segments can be labeled simultaneously. The multiple labeling objects of the labeling task corresponding to the attribute demand information do not have an incidence relation, the corresponding subtasks are in a parallel distribution mode, when different voices or texts are subjected to attribute labeling, the labeling process of each voice or text attribute is independent and does not affect each other, and all subtasks can be distributed in parallel.

In a possible design, according to a serial distribution manner and a parallel distribution manner, dividing a plurality of labeling tasks into a plurality of task groups having a sequential execution order, where after each task group includes at least one labeling task in parallel, the method may further include:

in the process of successively distributing a plurality of task groups according to the execution sequence, each task group is sequentially used as the current task group, and under the condition of obtaining the labeling result of the current task group, the tasks in the next task group of the current task group are updated according to the labeling result.

Optionally, updating the task in the task group next to the current task group according to the annotation result may include: and aiming at any task to be updated in the next task group, acquiring a target task which has an association relation with the task to be updated in the current task group. And updating the task to be updated according to the labeling result of the target task. As a possible implementation manner, the updating step of any task to be updated includes: and updating the plurality of annotation objects of the task by using the target task.

In this embodiment, for a task group having an execution order, the task of the current annotation task group is updated by using the annotation result of the current task group. For the task groups with the label association, the task of each task group can be accurately determined, the real-time update of the task is realized, the accurate execution of the tasks with the serial association is guaranteed, and the feasibility and the accuracy of the task execution are ensured while the task execution efficiency is improved.

As an embodiment, dividing the annotation task into at least one subtask according to the second association relationship, and determining a distribution manner of each subtask may include:

and dividing the annotation task into at least one subtask according to the second incidence relation, determining that the at least one subtask is a serial distribution mode under the condition that the incidence relation exists among the plurality of annotation objects, and determining that the at least one subtask is a parallel distribution mode under the condition that the incidence relation does not exist among the plurality of annotation objects.

Optionally, dividing the annotation task into at least one subtask according to the second association relationship may include: and dividing the plurality of labeling objects of the labeling task into at least one subtask according to a preset division rule according to the second association relation.

The specific first division step of the annotation task comprises: determining the labeling sequence of each of the plurality of labeled objects according to the association relation among the plurality of labeled objects under the condition that the second association relation is the association relation; sequencing the plurality of labeled objects according to the labeling sequence of each labeled object to obtain a plurality of sequenced labeled objects; and dividing the sequenced plurality of labeled objects into at least one subtask according to a preset division rule, wherein any subtask comprises at least one labeled object.

For convenience of understanding, it is assumed that the number of the plurality of labeled objects is 10000 and the number of the objects of each labeling task is 1000, that is, the labeled objects may be sorted according to the labeling sequence between the labeled objects, so as to obtain 10000 labeled objects after sorting. And then, starting from the first annotation object, dividing every 1000 annotation objects into one sub-task to obtain 10 sub-tasks, wherein each sub-task is composed of 1000 annotation objects, and the 1000 annotation objects of each sub-task are ordered according to respective second annotation order.

The specific second division step of the annotation task comprises: when the second association relationship is the association relationship, randomly determining the labeling sequence of the plurality of labeling objects; and dividing the labeling sequence of each labeled object into at least one subtask according to a preset division rule, wherein any subtask comprises at least one labeled object.

For convenience of understanding, it is assumed that the number of the plurality of annotation objects is 10000 and the number of the objects of each annotation task is 1000, that is, the annotation objects can be randomly ordered to obtain 10000 ordered annotation objects. And then, starting from the first annotation object, dividing every 1000 annotation objects into one sub-task to obtain 10 sub-tasks, wherein each sub-task is composed of 1000 annotation objects, and the 1000 annotation objects of each sub-task are ordered according to respective second annotation order.

In this embodiment, in the process of distributing any one of the annotation tasks, the annotation task is divided into at least one subtask according to the second association relationship. And confirming the distribution mode of at least one subtask according to the incidence relation of the plurality of labeled objects corresponding to the second incidence relation. Therefore, when the plurality of annotation objects have the association relationship, at least one subtask is determined to be a serial distribution mode, and when the plurality of annotation objects do not have the association relationship, at least one subtask is determined to be a parallel distribution mode. Through the distribution mode of obtaining each subtask, can distribute the subtask of different grade type according to corresponding distribution mode, realize that any mark task corresponds the accurate distribution of at least one subtask, the marking personnel need not to know all mark demands, only need know the mark demand that the subtask that receives is relevant can, further reduced the mark degree of difficulty, improve the distribution effect and in order to further improve mark efficiency.

In some embodiments, distributing all subtasks of each task according to the determined distribution manner may include:

under the condition that all subtasks of each task are in a serial distribution mode, distributing all subtasks of the tasks in a serial mode;

and under the condition that all the subtasks of each task are distributed in a parallel mode, distributing all the subtasks of the task in the parallel mode.

Optionally, in a case that all the subtasks of each task are in a serial distribution manner, distributing all the subtasks of the task in the serial manner may include: and under the condition that all the subtasks of each task are in a serial distribution mode, determining the distribution sequence of all the subtasks of the task, and serially distributing all the subtasks according to the distribution sequence corresponding to all the subtasks.

Optionally, the method further comprises: and in the process of successively distributing a plurality of subtasks according to the execution sequence, sequentially taking each subtask as the current subtask, and updating the next subtask of the current subtask according to the subtask marking result under the condition of obtaining the subtask marking result of the current subtask. In practical application, the step of updating any subtask specifically includes: and updating the plurality of labeled objects of the next subtask by using the sub-labeling result to obtain a plurality of labeled objects corresponding to the subtask.

Optionally, in a case that all the subtasks of each task are in a parallel distribution manner, distributing all the subtasks of the task in the parallel manner may include: and under the condition that all subtasks of each task are in a serial distribution mode, simultaneously distributing all subtasks of the task.

In this embodiment, in the process of distributing the subtasks of any one of the tasks, when all the subtasks are in the serial distribution method, all the subtasks are distributed in the serial distribution method, and when all the subtasks are in the parallel distribution method, all the subtasks are distributed in the parallel distribution method. The different subtasks are distributed according to different distribution modes, so that accurate distribution of each subtask is achieved, and a marking person does not need to know all marking requirements, only needs to know the received marking requirements related to the subtasks, and therefore the marking difficulty is further reduced, and the distribution efficiency and the distribution precision are improved.

As an embodiment, after distributing the plurality of annotation tasks, the method further comprises: and acquiring the labeling results corresponding to the plurality of labeling tasks respectively. And auditing the labeling results respectively corresponding to the plurality of labeling tasks to obtain a first auditing result.

In this embodiment, the results of each labeling task are respectively audited, and the labeling effect of each labeling task can be obtained by auditing the whole of each labeling task, so that the whole auditing of the labeling process is realized, and the whole auditing result is obtained.

In another embodiment, when distributing the plurality of subtasks, in order to label the labeling result, after distributing the plurality of labeling tasks to the corresponding labeling users, the method may further include:

and acquiring sub-labeling results corresponding to a plurality of sub-tasks of any one labeling task.

And auditing the sub-labeling results respectively corresponding to the plurality of sub-tasks of any labeling task to obtain a second auditing result.

In this embodiment, the respective sub-annotation results of the multiple sub-tasks of any one of the annotation tasks are audited, so that the auditing of a single sub-annotation task in the annotation process is realized, the annotation result of each sub-annotation task is confirmed, each sub-task in the annotation process can be independently audited, a more precise auditing is realized in a more detailed auditing manner, and the auditing precision is improved.

As shown in fig. 5, which is a schematic diagram of an information processing apparatus according to a fifth embodiment of the present disclosure, the apparatus may include the following steps:

the information acquisition module 501: the method is used for acquiring a plurality of marking requirement information.

The first determination module 502: the method is used for determining a first association relation among the plurality of annotation requirement information.

The second determination module 503: and the method is used for determining the distribution mode of the plurality of labeling tasks corresponding to the plurality of labeling requirement information according to the first incidence relation.

The task distribution module 504: for distributing the plurality of annotation tasks in a distribution manner.

In a fifth embodiment of the present disclosure, a plurality of marking requirement information is obtained. And acquiring a first incidence relation among the plurality of marking demand information to confirm the incidence relation among the marking demand information. According to the first incidence relation among the plurality of marking demand information, the plurality of marking tasks corresponding to the plurality of marking demand information are confirmed in the distribution mode, so that the plurality of marking tasks can be distributed according to the distribution mode, distribution processing of the marking tasks is achieved, a marking user only needs to pay attention to the marking demand information corresponding to the distributed marking tasks, all marking demand information does not need to be paid attention to, then the marking tasks can be completed quickly, and marking efficiency is effectively improved.

As an embodiment, the second determining module may include:

and the first determining unit is used for determining the annotation tasks corresponding to the two pieces of annotation demand information with the association relationship as a serial distribution mode and determining the annotation tasks corresponding to the two pieces of annotation demand information without the association relationship as a parallel distribution mode according to the first association relationship.

In some embodiments, the task distribution module may include:

the task dividing unit is used for dividing the plurality of marked tasks into a plurality of task groups with a sequential execution sequence according to a serial distribution mode and a parallel distribution mode, and each task group comprises at least one parallel task;

and the task distribution unit is used for successively distributing a plurality of task groups according to the execution sequence and simultaneously distributing the tasks in each task group.

As a possible implementation manner, the task distribution unit may include:

the task dividing subunit is used for determining a second incidence relation among a plurality of labeling objects corresponding to the labeling tasks for any one labeling task in each task group, dividing the labeling tasks into at least one subtask according to the second incidence relation, and determining a distribution mode of each subtask;

and the task distribution subunit is used for simultaneously distributing the tasks in each task group, and distributing all the subtasks of each task according to a determined distribution mode.

In yet another embodiment, the apparatus further comprises:

and the task updating module is used for sequentially taking each task group as the current task group in the process of successively distributing the plurality of task groups according to the execution sequence, and updating the task in the next task group of the current task group according to the marking result under the condition of obtaining the marking result of the current task group.

In one possible design, the task partitioning unit may be specifically configured to:

As a possible implementation manner, the task distribution subunit may specifically be configured to:

under the condition that all subtasks of each task are in a serial distribution mode, distributing all subtasks of the tasks in a serial mode; and under the condition that all the subtasks of each task are distributed in a parallel mode, distributing all the subtasks of the task in the parallel mode.

The functions of each module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method, and are not described herein again.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the information processing method. For example, in some embodiments, the information processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the information processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the information processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of information processing, comprising:

acquiring a plurality of marking demand information;

2. The method according to claim 1, wherein the determining, according to the first association, a distribution manner of a plurality of annotation tasks corresponding to the plurality of annotation demand information includes:

and according to the first incidence relation, determining the labeling tasks corresponding to the two pieces of labeling requirement information with incidence relation as a serial distribution mode, and determining the labeling tasks corresponding to the two pieces of labeling requirement information without incidence relation as a parallel distribution mode.

3. The method of claim 2, wherein said distributing said plurality of annotation tasks in said distribution manner comprises:

according to the serial distribution mode and the parallel distribution mode, dividing the plurality of labeling tasks into a plurality of task groups with a sequential execution sequence, wherein each task group comprises at least one parallel labeling task;

and distributing the plurality of task groups according to the execution sequence, and distributing the tasks in each task group simultaneously.

4. The method of claim 3, wherein said distributing tasks within each task group simultaneously comprises:

for any one labeling task in each task group, determining a second incidence relation among a plurality of labeling objects corresponding to the labeling task, dividing the labeling task into at least one subtask according to the second incidence relation, and determining a distribution mode of each subtask;

and simultaneously distributing the tasks in each task group, and distributing all the subtasks of each task according to the determined distribution mode.

5. The method of claim 3, further comprising:

and in the process of successively distributing the plurality of task groups according to the execution sequence, sequentially using each task group as a current task group, and updating the task in the next task group of the current task group according to the marking result under the condition of obtaining the marking result of the current task group.

6. The method according to claim 4, wherein the dividing the annotation task into at least one subtask according to the second association relationship and determining a distribution manner of each subtask comprises:

and dividing the labeling task into at least one subtask according to the second incidence relation, determining that the at least one subtask is a serial distribution mode under the condition that the incidence relations exist among the plurality of labeling objects, and determining that the at least one subtask is a parallel distribution mode under the condition that the incidence relations do not exist among the plurality of labeling objects.

7. The method of claim 6, wherein the distributing all subtasks for each task according to the determined distribution manner comprises:

under the condition that all subtasks of each task are in a serial distribution mode, distributing all subtasks of the task in a serial mode;

and under the condition that all the subtasks of each task are distributed in a parallel mode, distributing all the subtasks of the task in a parallel mode.

8. The method of claim 1, wherein the plurality of annotation requirement information is at least one of a plurality of voice requirement information set for voice annotation requirements and a plurality of text requirement information set for text annotation requirements.

9. An apparatus for information processing, comprising:

10. The apparatus of claim 9, wherein the second determining means comprises:

11. The apparatus of claim 10, wherein the task distribution module comprises:

the task dividing unit is used for dividing the plurality of marked tasks into a plurality of task groups with a sequential execution sequence according to the serial distribution mode and the parallel distribution mode, and each task group comprises at least one parallel task;

and the task distribution unit is used for distributing the plurality of task groups in sequence according to the execution sequence and distributing the tasks in each task group at the same time.

12. The apparatus of claim 11, wherein the task distribution unit comprises:

the task dividing subunit is configured to determine, for any one of the annotation tasks in each task group, a second association relationship between a plurality of annotation objects corresponding to the annotation task, divide the annotation task into at least one subtask according to the second association relationship, and determine a distribution manner of each subtask;

and the task distribution subunit is used for simultaneously distributing the tasks in each task group, and distributing all the subtasks of each task according to the determined mode.

13. The apparatus of claim 11, further comprising:

14. The apparatus of claim 12, wherein the task molecule partitioning unit is specifically configured to:

15. The apparatus according to claim 14, wherein the task distribution subunit is specifically configured to:

under the condition that all subtasks of each task are in a serial distribution mode, distributing all subtasks of the task in a serial mode; and under the condition that all the subtasks of each task are distributed in a parallel mode, distributing all the subtasks of the task in a parallel mode.

16. The apparatus of claim 9, wherein the plurality of annotation requirement information is at least one of a plurality of voice requirement information set for voice annotation requirements and a plurality of text requirement information set for text annotation requirements.

17. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.