CN115758451A - Data labeling method, device, equipment and storage medium based on artificial intelligence - Google Patents

Data labeling method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN115758451A
CN115758451A CN202211445886.1A CN202211445886A CN115758451A CN 115758451 A CN115758451 A CN 115758451A CN 202211445886 A CN202211445886 A CN 202211445886A CN 115758451 A CN115758451 A CN 115758451A
Authority
CN
China
Prior art keywords
data file
information
target
desensitization
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211445886.1A
Other languages
Chinese (zh)
Inventor
梁凯程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202211445886.1A priority Critical patent/CN115758451A/en
Publication of CN115758451A publication Critical patent/CN115758451A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a data labeling method based on artificial intelligence, which comprises the following steps: acquiring a data file to be marked from a preset database; acquiring the file format type of the data file; acquiring a sensitive information set corresponding to the file format type; determining a target desensitization rule corresponding to the file format type; desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file; and carrying out labeling processing on the desensitized data file based on a preset labeling platform to obtain a labeled target data file. The application also provides a data labeling device, computer equipment and a storage medium based on artificial intelligence. In addition, the present application also relates to a blockchain technique, and the target data file can be stored in the blockchain. Through the data annotation method and the data annotation device, the workload of data annotation is effectively reduced, the cost of data annotation is reduced, and the efficiency of data annotation is improved.

Description

Data labeling method, device, equipment and storage medium based on artificial intelligence
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a data annotation method and apparatus, a computer device, and a storage medium based on artificial intelligence.
Background
With the rapid progress of artificial intelligence technology, financial technology has a lot of profound influences on traditional business models. If in the insurance claim settlement scene, an insurer can shoot medical bills and invoices by using a mobile phone and upload the medical bills and invoices to a background of the insurance claim settlement system, the background calls an artificial intelligence algorithm model to identify the uploaded image documents, and whether insurance is performed or not is determined by combining insurance terms, so that the claim settlement efficiency and the user experience are improved.
In the technical field of artificial intelligence, a deep learning technology for pattern recognition based on unstructured data such as images and texts needs data labeling processing, and the labeled data quality is an important influence factor influencing the learning efficiency and quality of a model. Data labeling work usually needs professional labelers to complete, so a large amount of manpower and material resources need to be invested, and accordingly, the workload of data labeling is large, the efficiency of data labeling is low, and the cost of data labeling is high. In addition, in the process of data labeling, automatic hiding of data containing user sensitive information cannot be supported, so that the user sensitive information is easily leaked.
Disclosure of Invention
The embodiment of the application aims to provide a data labeling method, a data labeling device, computer equipment and a storage medium based on artificial intelligence, so that the problems that the conventional data labeling work is generally completed by professional labeling personnel, a large amount of manpower and material resources are required to be input, the workload of data labeling is large, the data labeling efficiency is low, and the cost of labeling is high are solved. In addition, in the process of data labeling, automatic hiding of data containing user sensitive information cannot be supported, so that the technical problem of leakage of the user sensitive information is easily caused.
In order to solve the above technical problem, an embodiment of the present application provides a data annotation method based on artificial intelligence, which adopts the following technical solutions:
acquiring a data file to be marked from a preset database;
acquiring the file format type of the data file;
acquiring a sensitive information set corresponding to the file format type;
determining a target desensitization rule corresponding to the file format type;
desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
Further, the step of performing desensitization processing on the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file specifically includes:
if the file format type is an image format, performing OCR (optical character recognition) processing on the data file, acquiring a text information set in the data file, and acquiring first coordinate information corresponding to the text information set;
performing sensitive information matching processing on the text information set based on a first regular expression corresponding to the sensitive information set to obtain matched specified text information;
desensitizing the designated text information based on a first desensitization processing mode corresponding to the text type to obtain corresponding desensitization text information;
acquiring first specified coordinate information of the specified text information in the data file based on the first coordinate information;
replacing the designated text information in the data file with the desensitization text information based on the first designated coordinate information to obtain a first desensitization data file;
identifying and detecting the first desensitization data file based on preset characteristic information to obtain a picture set containing the preset characteristic information and obtain second coordinate information corresponding to the picture set;
and performing desensitization processing on the picture set in the first desensitization data file by adopting a second desensitization processing mode corresponding to the picture type based on the second coordinate information to obtain the desensitized data file.
Further, the step of performing desensitization processing on the specified text information based on the first desensitization processing mode corresponding to the text type to obtain corresponding desensitization text information specifically includes:
acquiring the number of characters of the specified text information;
determining a target coding rule corresponding to the number of characters;
desensitizing the specified text information based on the target coding rule to obtain the desensitized text information.
Further, the preset feature information at least includes first feature information and second feature information, and the step of performing desensitization processing on the picture set in the first desensitization data file by using a second desensitization processing mode corresponding to a picture type based on the second coordinate information to obtain the desensitized data file specifically includes:
calling a preset processing tool;
acquiring a first fuzzy rule corresponding to the first characteristic information and acquiring a second fuzzy rule corresponding to the second characteristic information;
acquiring a first picture corresponding to the first characteristic information from the picture set, and acquiring a second picture corresponding to the second characteristic information;
acquiring second specified coordinate information corresponding to the first picture based on the second coordinate information, and acquiring third specified coordinate information corresponding to the second picture;
desensitizing the first picture in the first desensitized data file through the processing tool based on the first coordinate information and the first fuzzy rule to obtain a second desensitized data file;
and performing desensitization processing on the second picture in a second desensitization data file through the processing tool based on the second coordinate information and the second fuzzy rule to obtain the desensitized data file.
Further, the step of performing desensitization processing on the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file specifically includes:
if the file format type is a text format, performing sensitive information matching processing on text information in the data file based on a second regular expression corresponding to the sensitive information set to obtain matched target text information;
acquiring target position information of the target text information in the data file;
acquiring a text desensitization rule corresponding to the target text information;
desensitizing the target text information in the data file based on the text desensitization rule and the target position information to obtain the desensitized data file.
Further, the step of labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file specifically includes:
determining a target marking model corresponding to the file format type from the marking platform;
calling the target labeling model;
labeling the desensitized data file based on the target labeling model to obtain a corresponding labeling result;
and taking the labeling result as the target data file.
Further, the step of obtaining the file format type of the data file specifically includes:
reading suffix format information of the data file;
calling a preset format mapping table;
inquiring the format mapping table based on the suffix format information, and inquiring target format information corresponding to the suffix format information from the format mapping table;
and acquiring a target format type corresponding to the target format information from the format mapping table to obtain the file format type.
In order to solve the above technical problem, an embodiment of the present application further provides a data annotation device based on artificial intelligence, which adopts the following technical scheme:
the system comprises a first acquisition module, a second acquisition module and a marking module, wherein the first acquisition module is used for acquiring a data file to be marked from a preset database;
the second acquisition module is used for acquiring the file format type of the data file;
the third acquisition module is used for acquiring a sensitive information set corresponding to the file format type;
the determining module is used for determining a target desensitization rule corresponding to the file format type;
the desensitization module is used for desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and the marking module is used for marking the desensitized data file based on a preset marking platform to obtain a marked target data file.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
acquiring a data file to be marked from a preset database;
acquiring the file format type of the data file;
acquiring a sensitive information set corresponding to the file format type;
determining a target desensitization rule corresponding to the file format type;
desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and carrying out labeling processing on the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
acquiring a data file to be marked from a preset database;
acquiring the file format type of the data file;
acquiring a sensitive information set corresponding to the file format type;
determining a target desensitization rule corresponding to the file format type;
desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and carrying out labeling processing on the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the method comprises the steps of firstly, acquiring a data file to be marked from a preset database; then obtaining the file format type of the data file; acquiring a sensitive information set corresponding to the file format type; then determining a target desensitization rule corresponding to the file format type; desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file; and finally, labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file. According to the method and the device, the sensitive information set and the target desensitization rule corresponding to the data file are determined based on the file format type of the data file to be labeled, then desensitization processing is performed on the sensitive information matched with the sensitive information set in the data file according to the target desensitization rule to obtain the desensitized data file, intelligent hiding processing of the sensitive information in the data file is achieved, and therefore the situation that the sensitive information is leaked in the data labeling process of the data file can be effectively avoided. In addition, the automatic labeling processing of the desensitized data file can be realized by using the preset labeling platform, the workload of data labeling can be effectively reduced, the cost of data labeling can be reduced, and the efficiency of data labeling is further improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the description below are some embodiments of the present application, and that other drawings may be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram to which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of an artificial intelligence based data annotation process according to the present application;
FIG. 3 is a schematic block diagram of one embodiment of an artificial intelligence based data tagging apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Mov I ng P I feature expertsgroup Aud I o Layer I, motion picture experts compression standard audio Layer 3), an MP4 player (Mov I ng P I feature experts group Aud I o Layer I V, motion picture experts compression standard audio Layer 4), a laptop portable computer, a desktop computer, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the artificial intelligence based data annotation method provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the artificial intelligence based data annotation apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow diagram of one embodiment of an artificial intelligence based data annotation process in accordance with the present application is shown. The data annotation method based on artificial intelligence comprises the following steps:
step S201, obtaining a data file to be labeled from a preset database.
In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the artificial intelligence-based data annotation method operates may obtain the data file to be annotated in a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a wifi connection, a bluetooth connection, a wimax connection, a Z i gbee connection, a UWB (u l t ra W i deband) connection, and other wireless connection means now known or developed in the future. The preset database is a database in which data files for constructing the relevant model are stored in advance, and the data files for constructing the relevant model can be in a state to be labeled. The data file is an unstructured data file, and may belong to unstructured data such as images, texts, voice, video and the like.
Step S202, the file format type of the data file is obtained.
In this embodiment, the file format types at least include a text format and a picture format. The above specific implementation process for obtaining the file format type of the data file will be described in further detail in the following specific embodiments, and will not be elaborated herein.
Step S203, acquiring a sensitive information set corresponding to the file format type.
In this embodiment, for data belonging to different file format types, a sensitive information set that needs desensitization processing may be set in advance according to actual service usage requirements. In particular, for data belonging to a text format, the set of sensitive information may include a name, an address, a mobile phone number, an identification card, and the like. For data belonging to the image format, the sensitive information set can comprise a face head portrait, a license plate and the like belonging to the image format besides a name, an address, a mobile phone number and an identity card.
And step S204, determining a target desensitization rule corresponding to the file format type.
In this embodiment, for data belonging to different file format types, desensitization rules matching with various file format types can be preset according to actual service use requirements. Specifically, for data belonging to a text format, a text data desensitization rule corresponding to the data in the text format is preset; for data belonging to an image format, an image data desensitization rule corresponding to the data of the image format is set in advance.
And S205, desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file.
In this embodiment, the specific implementation process of performing desensitization processing on the data file based on the target desensitization rule and the sensitive information set to obtain the desensitized data file is described in further detail in the subsequent specific embodiments, and is not set forth herein more than once.
And S206, labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
In this embodiment, the specific implementation process of performing labeling processing on the desensitized data file based on the preset labeling platform to obtain a labeled target data file is described in further detail in the following specific embodiments, and will not be described herein too much.
The method comprises the steps of firstly, acquiring a data file to be marked from a preset database; then obtaining the file format type of the data file; acquiring a sensitive information set corresponding to the file format type; then determining a target desensitization rule corresponding to the file format type; desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file; and finally, labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file. According to the method and the device, the sensitive information set and the target desensitization rule corresponding to the data file are determined based on the file format type of the data file to be labeled, then desensitization processing is carried out on the sensitive information matched with the sensitive information set in the data file according to the target desensitization rule so as to obtain the desensitized data file, intelligent hiding processing of the sensitive information in the data file is achieved, and therefore the situation that the sensitive information is leaked in the data labeling process of the data file can be effectively avoided. In addition, the automatic labeling processing of the desensitized data file can be realized by using the preset labeling platform, the workload of data labeling can be effectively reduced, the cost of data labeling can be reduced, and the efficiency of data labeling is further improved.
In some alternative implementations, step S205 includes the following steps:
and if the file format type is an image format, performing OCR (optical character recognition) processing on the data file, acquiring a text information set in the data file, and acquiring first coordinate information corresponding to the text information set.
In this embodiment, the text information set may include a plurality of texts. By performing OCR recognition processing on the data file, a text information set including a plurality of texts can be obtained. The first coordinate information includes a coordinate position X and a coordinate position Y of the plurality of texts.
And performing sensitive information matching processing on the text information set based on a first regular expression corresponding to the sensitive information set to obtain matched specified text information.
In this embodiment, the sensitive information set may include target information that needs to be desensitized during the data file labeling process, and may include, for example, a name, an address, a mobile phone number, an identity card, and the like. The first regular expression can be a regular expression containing target information such as names, addresses, mobile phone numbers, identity cards and the like, and the first regular expression is used for carrying out sensitive information matching processing on the text information set, so that specified text information matched with the target information is obtained.
Desensitizing the designated text information based on a first desensitization processing mode corresponding to the text type to obtain corresponding desensitization text information.
In this embodiment, a specific implementation process of performing desensitization processing on the specified text information based on the first desensitization processing manner corresponding to the text type to obtain corresponding desensitization text information is described in further detail in subsequent specific embodiments, and is not set forth herein more than once.
And acquiring first specified coordinate information of the specified text information in the data file based on the first coordinate information.
And replacing the specified text information in the data file with the desensitization text information based on the first specified coordinate information to obtain a first desensitization data file.
In this embodiment, the position of the designated text information may be determined from the data file based on the first designated coordinate information, and the designated text information at the position may be replaced with the desensitization text information, so as to obtain a first desensitization data file.
And identifying and detecting the first desensitization data file based on preset characteristic information to obtain a picture set containing the preset characteristic information, and acquiring second coordinate information corresponding to the picture set.
In this embodiment, the preset feature information may specifically include information such as a face portrait, a license plate, and the like. The face head portrait image set comprising a plurality of face head portraits and the license plate image set comprising a plurality of license plates can be respectively obtained by carrying out face head portrait identification detection and license plate identification detection on the first desensitization data file, the position coordinates of the face head portraits contained in the face head portrait image set and the position coordinates of the license plates contained in the license plate image set can be obtained.
And desensitizing the picture set in the first desensitized data file by adopting a second desensitizing treatment mode corresponding to the picture type based on the second coordinate information to obtain the desensitized data file.
In this embodiment, the specific implementation process of performing desensitization processing on the image set in the first desensitization data file by using a second desensitization processing mode corresponding to an image type based on the second coordinate information to obtain the desensitized data file is described in further detail in subsequent specific embodiments, and will not be described herein.
According to the method and the device, when the file format type of the data file is detected to be the image format, sensitive information of the data file is identified by using OCR identification and a regular expression to obtain the corresponding sensitive text, the data file is further identified and detected based on the preset characteristic information to identify the corresponding sensitive picture, so that desensitization processing is subsequently performed on the sensitive text and the sensitive picture by using a related desensitization processing mode, intelligent hiding processing of the sensitive information in the data file is achieved, and the condition that the sensitive information is leaked in the process of labeling the data file is effectively avoided.
In some optional implementation manners of this embodiment, the desensitizing processing is performed on the specified text information based on a first desensitizing processing manner corresponding to the text type to obtain corresponding desensitized text information, including the following steps:
and acquiring the number of characters of the specified text information.
In this embodiment, the specified text information may include a name, a mobile phone number, an identification card, and the like. The corresponding number of characters can be obtained by counting the characters of the specified text information. For example, if the information type of the specified text information is a mobile phone number, the specified text information is 11.
And determining a target coding rule corresponding to the number of the characters.
In this embodiment, the type of the target information corresponding to the above-described specified text information may be determined based on the number of characters by a preset determination rule. Wherein, the above-mentioned definite rule includes: the number of characters in different numerical ranges respectively corresponds to text information belonging to different information types. Specifically, if the number of characters of the text message is 2-5, the information type corresponding to the text message is a name; if the number of the characters of the text message is 11, the message type corresponding to the text message is a mobile phone number; if the number of the characters of the text message is 18, the information type corresponding to the text message is an identity card. In addition, for text information belonging to different information types, encoding rules for performing desensitization processing, which correspond to the different information types, respectively, are constructed in advance. Specifically, for the text information belonging to the name, encoding all other characters except the first character in the text information; for the text information belonging to the mobile phone number, coding or fully coding the digits from 4 th to 7 th contained in the text information according to the sequence of the text information from front to back; and for the text information belonging to the identity card, encoding the last 4 th digit contained in the text information in the order from front to back of the text information. The encoding process is a process of replacing a character to be encoded with an asterisk.
Desensitizing the specified text information based on the target coding rule to obtain the desensitized text information.
The number of characters of the specified text information is obtained; then determining a target coding rule corresponding to the number of the characters; and then desensitizing the specified text information based on the target coding rule to obtain the desensitized text information. The information type of the designated text information is determined based on the number of characters, and then the designated text information can be subjected to corresponding desensitization processing by adopting a target coding rule associated with the information type to generate desensitization text information, so that intelligent hiding processing of text sensitive information in the data file can be effectively realized, and the condition that sensitive information is leaked in the process of labeling the data file is effectively avoided.
In some optional implementation manners, the preset feature information at least includes first feature information and second feature information, and the desensitization processing is performed on the picture set in the first desensitization data file by using a second desensitization processing method corresponding to a picture type based on the second coordinate information to obtain the desensitized data file, including the following steps:
and calling a preset processing tool.
In this embodiment, the processing tool is specifically a mosaic editing tool, and the mosaic editing tool may be used to perform coding processing, that is, mosaic processing on the picture file.
And acquiring a first fuzzy rule corresponding to the first characteristic information, and acquiring a second fuzzy rule corresponding to the second characteristic information.
In this embodiment, the first characteristic information may be a face portrait, and the second characteristic information may be a license plate. The first fuzzy rule may specifically be a mosaic process performed on a pixel region corresponding to the head portrait of the face by using a mosaic editing tool; the first fuzzy rule may specifically be a mosaic processing performed on a pixel region corresponding to a preset number of license plate numbers included in the license plate by using a mosaic editing tool. The preset number is greater than 1, the value of the preset number is not specifically limited, the sequence of the selected license plate numbers is not specifically limited, the preset number can be set according to actual use requirements, and the license plate numbers selected from the second feature information are randomly determined if the preset number is preferably set to 4.
And acquiring a first picture corresponding to the first characteristic information from the picture set, and acquiring a second picture corresponding to the second characteristic information.
And acquiring second specified coordinate information corresponding to the first picture based on the second coordinate information, and acquiring third specified coordinate information corresponding to the second picture.
And desensitizing the first picture in the first desensitized data file through the processing tool based on the first coordinate information and the first fuzzy rule to obtain a second desensitized data file.
In this embodiment, a first position of the first picture may be determined from the first desensitization data file based on the first coordinate information, and then all pixel regions corresponding to the first picture at the first position are subjected to mosaic processing to obtain a corresponding second desensitization data file.
And performing desensitization processing on the second picture in a second desensitization data file through the processing tool based on the second coordinate information and the second fuzzy rule to obtain the desensitized data file.
In this embodiment, a second position of the second picture may be determined from the second desensitization data file based on the second coordinate information, and then mosaic processing may be performed on the target pixel region corresponding to the first picture at the first position, so as to obtain a corresponding third desensitization data file, that is, the data file after desensitization. The target pixel area refers to a pixel area corresponding to a preset number of license plate numbers contained in the second picture.
This application through obtain with first fuzzy rule that first characteristic information corresponds, and obtain with the second fuzzy rule that second characteristic information corresponds, and then carry out corresponding desensitization based on processing tool and above-mentioned fuzzy rule to the picture set in the data file, obtain data file after the desensitization can effectively realize hiding the processing to the intelligence of the sensitive information of picture in the data file, has effectively avoided appearing the sensitive information condition of revealing in the in-process of marking the data file.
In another embodiment, the file format type of the data file may further include a video format, and for a data file belonging to the video format, the electronic device automatically performs processing of extracting a video frame image from the data file to obtain corresponding image data, and then performs corresponding desensitization processing according to the desensitization scheme corresponding to the file format type being the image format, and the specific desensitization processing manner is not described herein again.
In some alternative implementations, step S205 includes the following steps:
and if the file format type is a text format, performing sensitive information matching processing on the text information in the data file based on a second regular expression corresponding to the sensitive information set to obtain matched target text information.
In this embodiment, the sensitive information set may include target information that needs to be desensitized during the process of annotating the data file, and may include, for example, a name, an address, a mobile phone number, an identity card, and the like. The second regular expression can be a regular expression containing target information such as names, addresses, mobile phone numbers, identity cards and the like, and sensitive information matching processing is carried out on the text information in the data files by using the second regular expression, so that the target text information matched with the target information can be obtained.
And acquiring the target position information of the target text information in the data file.
In this embodiment, jque may be used to recursively traverse all elements in the data file in advance to obtain a position information set of all elements included in the data file, and then target position information of the target text information in the data file is screened from the position information set.
And acquiring a text desensitization rule corresponding to the target text information.
In this embodiment, the specific implementation process of obtaining the text desensitization rule corresponding to the target text information may refer to the desensitization processing performed on the specified text information based on the first desensitization processing mode corresponding to the text type to obtain the corresponding desensitization text information, which is not described herein.
Desensitizing the target text information in the data file based on the text desensitization rule and the target position information to obtain the desensitized data file.
When the file format type of the data file is detected to be a text format, the text information in the data file is subjected to sensitive information matching processing by using the regular expression to the text information set, matched target text information is obtained, target position information of the target text information in the data file is obtained, then desensitization processing is carried out on the target text information in the data file based on the text desensitization rule and the target position information, the desensitized data file is obtained, intelligent hiding processing of the sensitive information in the data file is achieved, and the condition that sensitive information is leaked in the process of labeling the data file is effectively avoided.
In another embodiment, the file format type of the data file may further include a voice format, and for a data file belonging to the voice format, the electronic device automatically performs a voice-to-text processing on the data file to obtain corresponding text data, and then performs a corresponding desensitization processing according to the desensitization scheme corresponding to the file format type being the text format, and the specific desensitization processing mode is not described herein in detail.
In some optional implementations of this embodiment, step S206 includes the following steps:
and determining a target annotation model corresponding to the file format type from the annotation platform.
In this embodiment, the annotation platform is a pre-constructed annotation model stored with different file formats respectively associated therewith. Specifically, the labeling model can include a classification model based on r esnet and a detection model based on yo l ov 5. The classification model of r esnet is a model suitable for automatic labeling processing of data files belonging to a text format, and the detection model based on yo ov5 is a model suitable for automatic labeling processing of data files belonging to an image format.
And calling the target labeling model.
In this embodiment, the training generation process of the annotation model corresponding to the yo ov 5-based detection model may include: the method comprises the steps of obtaining a training sample in advance, wherein the training sample comprises at least one sample picture and picture labels corresponding to the sample picture, taking the at least one sample picture as an input variable of a labeling model, taking the picture labels corresponding to the sample picture as an output variable of the labeling model, and establishing the labeling model based on the training sample. When the pictures are labeled, the pictures to be labeled are input into the labeling model, and then the labeling results corresponding to the pictures to be labeled are output through the labeling model, so that automatic labeling is realized.
And carrying out labeling processing on the desensitized data file based on the target labeling model to obtain a corresponding labeling result.
In this embodiment, the labeling result includes, but is not limited to, a label, a coordinate point, a remark value, and the like. And the user can also perform manual labeling processing on the desensitized data file in the labeling platform to generate a corresponding labeling result. Or the user can correspondingly adjust the labeling result obtained by labeling the desensitized data file based on the target labeling model according to the actual service, so that the use experience of the user can be improved.
And taking the labeling result as the target data file.
Determining a target marking model corresponding to the file format type from the marking platform; then calling the target labeling model; and then, the desensitized data file is labeled based on the target labeling model to obtain a corresponding labeling result, and the labeling result is used as the target data file, so that the automatic labeling of the data file to be labeled is realized through the labeling model, and the efficiency of data labeling is effectively improved.
In some optional implementations of this embodiment, step S202 includes the following steps:
reading suffix format information of the data file.
In this embodiment, the file format type refers to a special encoding method for information used for storing information, and is used for identifying data stored inside. The file format types at least comprise a text format and a picture format. The file format of the data file can be determined by reading suffix format information of the data file and based on the suffix format information.
And calling a preset format mapping table.
In the present embodiment, the format mapping table is a data table created in advance and storing a plurality of suffix format information and format types respectively corresponding to the suffix format information. For files with suffix format information of txt, pdf, doc, x ls, ppt, htm and the like, the format type corresponding to the files is a text format; for files with suffix format information of bmp, j pg, png, t if, g if, pcx, etc., the format type corresponding to the file is a picture format.
And inquiring the format mapping table based on the suffix format information, and inquiring target format information corresponding to the suffix format information from the format mapping table.
In the present embodiment, the above target format information refers to the same format information as the suffix format information.
And acquiring a target format type corresponding to the target format information from the format mapping table to obtain the file format type.
Reading suffix format information of the data file; then, based on the suffix format information, carrying out query processing on a preset format mapping table, and querying target format information corresponding to the suffix format information from the format mapping table; and then acquiring a target format type corresponding to the target format information from the format mapping table to obtain the file format type. By using the format mapping table, the file format type of the data file can be determined quickly and accurately, and the accuracy of the obtained file format type is ensured.
It is emphasized that, to further ensure the privacy and security of the target data file, the target data file may also be stored in a node of a blockchain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The block chain (B l ockcha i n), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. The artificial intelligence (Art I f I c I a l I nte l I gene, AI) is a theory, method, technique and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by a digital computer, sensing environment, acquiring knowledge and obtaining the best result by using the knowledge.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an artificial intelligence based data annotation device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be applied to various electronic devices.
As shown in fig. 3, the artificial intelligence-based data labeling apparatus 300 according to this embodiment includes: a first acquisition module 301, a second acquisition module 302, a third acquisition module 303, a determination module 304, a desensitization module 305, and an annotation module 306. Wherein:
a first obtaining module 301, configured to obtain a data file to be labeled from a preset database;
a second obtaining module 302, configured to obtain a file format type of the data file;
a third obtaining module 303, configured to obtain a sensitive information set corresponding to the file format type;
a determining module 304, configured to determine a target desensitization rule corresponding to the file format type;
a desensitization module 305, configured to perform desensitization processing on the data file based on the target desensitization rule and the sensitive information set, to obtain a desensitized data file;
and the labeling module 306 is configured to label the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In some optional implementations of this embodiment, the desensitization module 305 includes:
the first identification submodule is used for carrying out OCR (optical character recognition) processing on the data file if the file format type is an image format, acquiring a text information set in the data file and acquiring first coordinate information corresponding to the text information set;
the first matching submodule is used for carrying out sensitive information matching processing on the text information set on the basis of a first regular expression corresponding to the sensitive information set to obtain matched specified text information;
the first desensitization sub-module is used for desensitizing the specified text information based on a first desensitization processing mode corresponding to the text type to obtain corresponding desensitization text information;
the first obtaining sub-module is used for obtaining first specified coordinate information of the specified text information in the data file based on the first coordinate information;
the replacing submodule is used for replacing the specified text information in the data file with the desensitization text information based on the first specified coordinate information to obtain a first desensitization data file;
the second identification submodule is used for carrying out identification detection on the first desensitization data file based on preset characteristic information to obtain a picture set containing the preset characteristic information and acquiring second coordinate information corresponding to the picture set;
and the second desensitization sub-module is used for performing desensitization treatment on the picture set in the first desensitization data file by adopting a second desensitization treatment mode corresponding to the picture type based on the second coordinate information to obtain the desensitized data file.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In some optional implementations of this embodiment, the first desensitization sub-module includes:
a first acquisition unit configured to acquire the number of characters of the specified text information;
a determination unit configured to determine a target encoding rule corresponding to the number of characters;
and the first desensitization unit is used for performing desensitization treatment on the specified text information based on the target coding rule to obtain the desensitized text information.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence based data labeling method of the foregoing embodiment one to one, and are not described herein again.
In some optional implementation manners of this embodiment, the preset feature information at least includes first feature information and second feature information, and the second desensitization sub-module includes:
the calling unit is used for calling a preset processing tool;
a second obtaining unit configured to obtain a first fuzzy rule corresponding to the first feature information, and obtain a second fuzzy rule corresponding to the second feature information;
a third obtaining unit, configured to obtain a first picture corresponding to the first feature information from the picture set, and obtain a second picture corresponding to the second feature information;
a fourth acquiring unit configured to acquire second specified coordinate information corresponding to the first picture based on the second coordinate information, and acquire third specified coordinate information corresponding to the second picture;
the second desensitization unit is used for performing desensitization processing on the first picture in the first desensitization data file through the processing tool based on the first coordinate information and the first fuzzy rule to obtain a second desensitization data file;
and the third desensitization unit is used for performing desensitization treatment on the second picture in the second desensitization data file through the processing tool based on the second coordinate information and the second fuzzy rule to obtain the desensitized data file.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In some optional implementations of this embodiment, the desensitization module 305 includes:
the second matching sub-module is used for performing sensitive information matching processing on the text information in the data file based on a second regular expression corresponding to the sensitive information set if the file format type is a text format, so as to obtain matched target text information;
the second obtaining submodule is used for obtaining the target position information of the target text information in the data file;
the third obtaining submodule is used for obtaining a text desensitization rule corresponding to the target text information;
and the third desensitization sub-module is used for performing desensitization treatment on the target text information in the data file based on the text desensitization rule and the target position information to obtain the desensitized data file.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence-based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In some optional implementations of this embodiment, the labeling module 306 includes:
the first determining sub-module is used for determining a target annotation model corresponding to the file format type from the annotation platform;
the first calling sub-module is used for calling the target labeling model;
the labeling submodule is used for labeling the desensitized data file based on the target labeling model to obtain a corresponding labeling result;
and the second determining submodule is used for taking the labeling result as the target data file.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence-based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In some optional implementations of this embodiment, the second obtaining module 302 includes:
the reading submodule is used for reading suffix format information of the data file;
the second calling submodule is used for calling a preset format mapping table;
the query submodule is used for performing query processing on the format mapping table based on the suffix format information and querying target format information corresponding to the suffix format information from the format mapping table;
and the fourth obtaining submodule is used for obtaining the target format type corresponding to the target format information from the format mapping table to obtain the file format type.
In this embodiment, the operations respectively executed by the modules or units correspond to the steps of the artificial intelligence based data labeling method of the foregoing embodiment one by one, and are not described herein again.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4 in particular, fig. 4 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. AS will be understood by those skilled in the art, the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an application specific integrated circuit (App I cat I on pec I C I integrated C I rcu t, AS ic), a programmable Gate array (F I e D-programmable Gate Ar ray, FPGA), a digital Processor (D I g ta l S I gn a Processor, DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure digital (Secure D i g i ta l, SD) Card, a flash memory Card (F l ash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed on the computer device 4 and various types of application software, such as computer readable instructions of an artificial intelligence based data tagging method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (CPU, cent r a lprocess i ng Un i t), a controller, a microcontroller, a microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, for example, execute computer readable instructions of the artificial intelligence based data annotation method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing a communication connection between the computer device 4 and other electronic devices.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
in the embodiment of the application, a data file to be marked is obtained from a preset database; then obtaining the file format type of the data file; acquiring a sensitive information set corresponding to the file format type; then determining a target desensitization rule corresponding to the file format type; desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file; and finally, carrying out labeling processing on the desensitized data file based on a preset labeling platform to obtain a labeled target data file. According to the method and the device, the sensitive information set and the target desensitization rule corresponding to the data file are determined based on the file format type of the data file to be labeled, then desensitization processing is performed on the sensitive information matched with the sensitive information set in the data file according to the target desensitization rule to obtain the desensitized data file, intelligent hiding processing of the sensitive information in the data file is achieved, and therefore the situation that the sensitive information is leaked in the data labeling process of the data file can be effectively avoided. In addition, the automatic labeling processing of the desensitized data file can be realized by using the preset labeling platform, the workload of data labeling can be effectively reduced, the cost of data labeling can be reduced, and the efficiency of data labeling is further improved.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the artificial intelligence based data annotation method as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
in the embodiment of the application, a data file to be marked is obtained from a preset database; then obtaining the file format type of the data file; acquiring a sensitive information set corresponding to the file format type; then determining a target desensitization rule corresponding to the file format type; desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file; and finally, carrying out labeling processing on the desensitized data file based on a preset labeling platform to obtain a labeled target data file. According to the method and the device, the sensitive information set and the target desensitization rule corresponding to the data file are determined based on the file format type of the data file to be marked, then the sensitive information matched with the sensitive information set in the data file is desensitized according to the target desensitization rule to obtain the desensitized data file, intelligent hiding processing of the sensitive information in the data file is achieved, and therefore the situation that the sensitive information is leaked in the data marking process of the data file can be effectively avoided. In addition, the automatic labeling processing of the desensitized data file can be realized by using the preset labeling platform, the workload of data labeling can be effectively reduced, the cost of data labeling can be reduced, and the efficiency of data labeling is further improved.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A data annotation method based on artificial intelligence is characterized by comprising the following steps:
acquiring a data file to be marked from a preset database;
acquiring the file format type of the data file;
acquiring a sensitive information set corresponding to the file format type;
determining a target desensitization rule corresponding to the file format type;
desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
2. The artificial intelligence-based data annotation method according to claim 1, wherein the step of desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file specifically comprises:
if the file format type is an image format, performing OCR (optical character recognition) processing on the data file, acquiring a text information set in the data file, and acquiring first coordinate information corresponding to the text information set;
performing sensitive information matching processing on the text information set based on a first regular expression corresponding to the sensitive information set to obtain matched specified text information;
desensitizing the designated text information based on a first desensitizing treatment mode corresponding to the text type to obtain corresponding desensitizing text information;
acquiring first specified coordinate information of the specified text information in the data file based on the first coordinate information;
replacing the designated text information in the data file with the desensitization text information based on the first designated coordinate information to obtain a first desensitization data file;
identifying and detecting the first desensitization data file based on preset characteristic information to obtain a picture set containing the preset characteristic information and obtain second coordinate information corresponding to the picture set;
and desensitizing the picture set in the first desensitized data file by adopting a second desensitizing treatment mode corresponding to the picture type based on the second coordinate information to obtain the desensitized data file.
3. The artificial intelligence-based data annotation method according to claim 2, wherein the step of desensitizing the specified text information based on the first desensitization processing mode corresponding to the text type to obtain corresponding desensitization text information specifically comprises:
acquiring the number of characters of the specified text information;
determining a target coding rule corresponding to the number of the characters;
and carrying out desensitization processing on the specified text information based on the target coding rule to obtain the desensitization text information.
4. The artificial intelligence-based data annotation method according to claim 2, wherein the preset feature information at least includes first feature information and second feature information, and the step of performing desensitization processing on the image set in the first desensitization data file by using a second desensitization processing method corresponding to an image type based on the second coordinate information to obtain the desensitized data file specifically includes:
calling a preset processing tool;
acquiring a first fuzzy rule corresponding to the first characteristic information and acquiring a second fuzzy rule corresponding to the second characteristic information;
acquiring a first picture corresponding to the first characteristic information from the picture set, and acquiring a second picture corresponding to the second characteristic information;
acquiring second specified coordinate information corresponding to the first picture based on the second coordinate information, and acquiring third specified coordinate information corresponding to the second picture;
desensitizing the first picture in the first desensitized data file through the processing tool based on the first coordinate information and the first fuzzy rule to obtain a second desensitized data file;
and desensitizing the second picture in a second desensitized data file by the processing tool based on the second coordinate information and the second fuzzy rule to obtain the desensitized data file.
5. The artificial intelligence-based data annotation method of claim 1, wherein the step of performing desensitization processing on the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file specifically comprises:
if the file format type is a text format, performing sensitive information matching processing on text information in the data file based on a second regular expression corresponding to the sensitive information set to obtain matched target text information;
acquiring target position information of the target text information in the data file;
acquiring a text desensitization rule corresponding to the target text information;
desensitizing the target text information in the data file based on the text desensitization rule and the target position information to obtain the desensitized data file.
6. The artificial intelligence-based data annotation method of claim 1, wherein the step of annotating the desensitized data file based on a preset annotation platform to obtain an annotated target data file specifically comprises:
determining a target marking model corresponding to the file format type from the marking platform;
calling the target labeling model;
performing labeling processing on the desensitized data file based on the target labeling model to obtain a corresponding labeling result;
and taking the labeling result as the target data file.
7. The artificial intelligence based data annotation method of claim 1, wherein the step of obtaining the file format type of the data file specifically comprises:
reading suffix format information of the data file;
calling a preset format mapping table;
inquiring the format mapping table based on the suffix format information, and inquiring target format information corresponding to the suffix format information from the format mapping table;
and acquiring a target format type corresponding to the target format information from the format mapping table to obtain the file format type.
8. A data annotation device based on artificial intelligence, characterized by comprising:
the system comprises a first acquisition module, a second acquisition module and a marking module, wherein the first acquisition module is used for acquiring a data file to be marked from a preset database;
the second acquisition module is used for acquiring the file format type of the data file;
the third acquisition module is used for acquiring a sensitive information set corresponding to the file format type;
the determining module is used for determining a target desensitization rule corresponding to the file format type;
the desensitization module is used for desensitizing the data file based on the target desensitization rule and the sensitive information set to obtain a desensitized data file;
and the labeling module is used for labeling the desensitized data file based on a preset labeling platform to obtain a labeled target data file.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the artificial intelligence based data annotation method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of the artificial intelligence based data annotation method of any one of claims 1 to 7.
CN202211445886.1A 2022-11-18 2022-11-18 Data labeling method, device, equipment and storage medium based on artificial intelligence Pending CN115758451A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211445886.1A CN115758451A (en) 2022-11-18 2022-11-18 Data labeling method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211445886.1A CN115758451A (en) 2022-11-18 2022-11-18 Data labeling method, device, equipment and storage medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN115758451A true CN115758451A (en) 2023-03-07

Family

ID=85373252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211445886.1A Pending CN115758451A (en) 2022-11-18 2022-11-18 Data labeling method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN115758451A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523544A (en) * 2023-06-25 2023-08-01 江西省机电设备招标有限公司 Software price measuring and calculating method, system, storage medium and equipment
CN117010019A (en) * 2023-08-04 2023-11-07 北京泰策科技有限公司 Data desensitization method and system based on NLP language model
CN117521148A (en) * 2023-12-29 2024-02-06 苏州元脑智能科技有限公司 Information interaction method and device based on block chain, storage medium and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523544A (en) * 2023-06-25 2023-08-01 江西省机电设备招标有限公司 Software price measuring and calculating method, system, storage medium and equipment
CN116523544B (en) * 2023-06-25 2023-11-14 江西省机电设备招标有限公司 Software price measuring and calculating method, system, storage medium and equipment
CN117010019A (en) * 2023-08-04 2023-11-07 北京泰策科技有限公司 Data desensitization method and system based on NLP language model
CN117010019B (en) * 2023-08-04 2024-04-16 北京泰策科技有限公司 Data desensitization method and system based on NLP language model
CN117521148A (en) * 2023-12-29 2024-02-06 苏州元脑智能科技有限公司 Information interaction method and device based on block chain, storage medium and electronic equipment
CN117521148B (en) * 2023-12-29 2024-04-02 苏州元脑智能科技有限公司 Information interaction method and device based on block chain, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN115758451A (en) Data labeling method, device, equipment and storage medium based on artificial intelligence
CN112016273A (en) Document directory generation method and device, electronic equipment and readable storage medium
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN114240672A (en) Method for identifying green asset proportion and related product
CN116453125A (en) Data input method, device, equipment and storage medium based on artificial intelligence
CN112507141A (en) Investigation task generation method and device, computer equipment and storage medium
CN116704528A (en) Bill identification verification method, device, computer equipment and storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN112395450B (en) Picture character detection method and device, computer equipment and storage medium
CN115757075A (en) Task abnormity detection method and device, computer equipment and storage medium
CN114330240A (en) PDF document analysis method and device, computer equipment and storage medium
CN114359928A (en) Electronic invoice identification method and device, computer equipment and storage medium
CN113822215A (en) Equipment operation guide file generation method and device, electronic equipment and storage medium
CN112396111A (en) Text intention classification method and device, computer equipment and storage medium
CN114820211B (en) Method, device, computer equipment and storage medium for checking and verifying quality of claim data
CN117409430A (en) Medical bill information extraction method, device, equipment and storage medium thereof
CN117037197A (en) Abnormal identification method, device, equipment and storage medium based on artificial intelligence
CN118227491A (en) Method and device for generating test cases, computer equipment and storage medium
CN115826973A (en) List page generation method and device, computer equipment and storage medium
CN116977611A (en) Picture identification method, device, computer equipment and storage medium
CN116738948A (en) Data processing method, device, computer equipment and storage medium
CN115904657A (en) Document generation method and device, computer equipment and storage medium
CN117034173A (en) Data processing method, device, computer equipment and storage medium
CN117076775A (en) Information data processing method, information data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination