CN116361858A - User session resource data protection method and software product applying AI decision - Google Patents

User session resource data protection method and software product applying AI decision Download PDF

Info

Publication number
CN116361858A
CN116361858A CN202310371962.7A CN202310371962A CN116361858A CN 116361858 A CN116361858 A CN 116361858A CN 202310371962 A CN202310371962 A CN 202310371962A CN 116361858 A CN116361858 A CN 116361858A
Authority
CN
China
Prior art keywords
text
sensitive
processing network
vector
session resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310371962.7A
Other languages
Chinese (zh)
Other versions
CN116361858B (en
Inventor
杨权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Infinite Free Culture Media Co ltd
Beijing Peihong Wangzhi Technology Co ltd
Original Assignee
Guangxi Nanning Xibei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Nanning Xibei Technology Co ltd filed Critical Guangxi Nanning Xibei Technology Co ltd
Priority to CN202310371962.7A priority Critical patent/CN116361858B/en
Publication of CN116361858A publication Critical patent/CN116361858A/en
Application granted granted Critical
Publication of CN116361858B publication Critical patent/CN116361858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention relates to the technical field of artificial intelligence and big data, and provides a user session resource data protection method and a software product applying AI decision. In the application stage of the AI neural network, the Online session resource text is subjected to the large data anonymous desensitization protection by combining the large data anonymous desensitization technology, so that the accuracy and the rationality of the large data anonymous desensitization protection can be improved. In addition, the Online session resource text can relate to the fields of metauniverse, digital service and the like, so that the method and the software product have high reusability and high expandability.

Description

User session resource data protection method and software product applying AI decision
Technical Field
The invention relates to the technical field of artificial intelligence and big data, in particular to a user session resource data protection method and a software product applying AI decision.
Background
Since the birth of artificial intelligence, theories and technologies are increasingly mature, and the application fields of the artificial intelligence are continuously expanded. For example, artificial intelligence is increasingly used in the fields of question and answer robots, language recognition, image recognition, expert systems, and the like. With the further promotion of big data age, the collision of artificial intelligence and big data brings quality change to most industries, and the comprehensive mode of a series of front edge technologies such as artificial intelligence, big data and cloud computing can be matched with metauniverse, digital economy, smart city, intelligent manufacturing and the like, so that work and life are improved to a great extent, and great contribution is made to economic development. In the context of big data and digitization, protection against data information is also not negligible.
Disclosure of Invention
The invention provides a user session resource data protection method and a software product applying AI decision, which combine large data of resource text to carry out joint debugging on an AI neural network so as to ensure the performance of the AI neural network. In the application stage of the AI neural network, the Online session resource text is subjected to the large data anonymous desensitization protection by combining the large data anonymous desensitization technology, so that the accuracy and the rationality of the large data anonymous desensitization protection can be improved. In addition, the Online session resource text can relate to the fields of metauniverse, digital service and the like, so that the method and the software product have high reusability and high expandability. In order to achieve the technical purpose, the invention adopts the following technical scheme.
The first aspect is a user session resource data protection method applying AI decision, applied to a big data AI decision server, the method comprising:
obtaining an original Online conversation resource text, and carrying out text detail reconstruction on the original Online conversation resource text to obtain an Online conversation resource reconstructed text;
performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector;
obtaining a general sensitive text processing network, and adopting the general sensitive text processing network and the target sensitive data characterization vector to carry out joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network;
Combining the network variables of the general sensitive text processing network, and performing improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network;
obtaining a general sensitive data characterization vector and a to-be-anonymous Online session resource text, and performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector;
and carrying out sensitive data anonymous protection on the to-be-anonymous Online session resource text by adopting the final sensitive text processing network and the sensitive data splicing vector to obtain the Online session resource desensitization text meeting the big data protection condition.
In some optional embodiments, the performing sensitive data anonymization protection on the Online session resource text to be anonymized by using the final sensitive text processing network and the sensitive data stitching vector to obtain an Online session resource desensitized text meeting a big data protection condition includes:
extracting text features of the to-be-anonymous Online conversation resource text by adopting the final sensitive text processing network to obtain a to-be-anonymous sensitive text vector of the to-be-anonymous Online conversation resource text;
Performing sensitive element generalization operation on the sensitive text vector to be anonymized by adopting the sensitive data splicing vector to obtain a sensitive text generalization vector;
and performing text recovery operation on the sensitive text generalization vector by adopting the final sensitive text processing network to obtain the Online session resource desensitization text meeting the big data protection condition.
In some optional embodiments, the performing text feature extraction on the to-be-anonymous Online session resource text by using the final sensitive text processing network to obtain a to-be-anonymous sensitive text vector of the to-be-anonymous Online session resource text includes:
adopting the final sensitive text processing network to perform sensitive data vector mining processing on the to-be-anonymous Online session resource text to obtain text description data of the to-be-anonymous Online session resource text;
performing region projection operation on the text description data by adopting the final sensitive text processing network to obtain a text region positioning tag of the text description data;
and generating a sensitive text vector to be anonymous of the to-be-anonymous Online conversation resource text through the text region positioning tag by adopting the final sensitive text processing network.
In some optional embodiments, the performing a token vector stitching operation on the generic sensitive data token vector and the target sensitive data token vector to obtain a sensitive data stitching vector includes:
summarizing the general sensitive data characterization vector to obtain a summarized sensitive data characterization vector;
and vector aggregation is carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
In some optional embodiments, the performing, in combination with the network variable of the general sensitive text processing network, an improvement operation on the network variable of the intermediate sensitive text processing network to obtain a final sensitive text processing network includes:
extracting at least one network element to be improved from the intermediate sensitive text processing network;
extracting corresponding improved auxiliary units from the universal sensitive text processing network through the network unit to be improved;
and carrying out improvement operation on the element configuration variables of the element to be improved by combining the element configuration variables of the improvement auxiliary unit to obtain the final sensitive text processing network.
In some optional embodiments, the modifying the element configuration variable of the network element to be modified in combination with the element configuration variable of the modified auxiliary element to obtain the final sensitive text processing network includes:
determining a unit configuration variable weighting factor for the network element to be modified and a unit configuration variable weighting factor for the modified auxiliary unit;
and vector aggregation is carried out on the unit configuration variables of the to-be-improved network unit and the unit configuration variables of the improved auxiliary unit through the unit configuration variable weighting factors of the to-be-improved network unit and the unit configuration variable weighting factors of the improved auxiliary unit, so that the final sensitive text processing network is obtained.
In some optional embodiments, the adopting the general sensitive text processing network and the target sensitive data characterization vector performs joint debugging on the set sensitive text processing network to obtain an intermediate sensitive text processing network, and includes:
adopting the network variables of the general sensitive text processing network to roll back the network variables of the set sensitive text processing network to obtain a default sensitive text processing network;
And debugging the default sensitive text processing network by adopting the target sensitive data characterization vector to obtain the intermediate sensitive text processing network.
In some optional embodiments, the debugging the default sensitive text processing network using the target sensitive data characterization vector to obtain the intermediate sensitive text processing network includes:
acquiring an Online session resource debugging text;
adopting the target sensitive data characterization vector and the default sensitive text processing network to carry out sensitive data anonymous protection on the Online session resource debugging text to obtain a text anonymous protection prediction result;
determining the text anonymous protection prediction result and debugging cost data of a template Online conversation resource text;
and improving network variables of the default sensitive text processing network through the debugging cost data to obtain the universal sensitive text processing network.
In some optional embodiments, the performing sensitive data anonymization protection on the Online session resource debugging text by using the target sensitive data characterization vector and the default sensitive text processing network to obtain a text anonymization protection prediction result includes:
Adopting the default sensitive text processing network to conduct text feature extraction on the Online session resource debugging text to obtain a sensitive data characterization vector sample;
performing sensitive element generalization operation on the sensitive data representation vector sample by adopting the target sensitive data representation vector to obtain a sensitive text generalization vector sample of the Online session resource debugging text;
and generating a text anonymous protection prediction result of the Online session resource debugging text through the sensitive text generalization vector sample by adopting the default sensitive text processing network.
In some optional embodiments, the generating, by using the default sensitive text processing network and the sensitive text generalization vector sample, a text anonymous protection prediction result of the Online session resource debug text includes:
performing text recovery operation on the sensitive text generalization vector sample by adopting the default sensitive text processing network to obtain an Online session resource recovery text;
performing content discrimination operation on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text;
and adopting the content discrimination result to carry out content significance adjustment on the Online session resource recovery text to obtain the text anonymous protection prediction result.
In some alternative embodiments, the method further comprises:
performing text detail analysis on the Online session resource desensitization text to obtain a text detail analysis result of the Online session resource desensitization text;
and carrying out text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain the Online session resource desensitization reconstruction text.
In some optional embodiments, the reconstructing text details of the Online session resource desensitization text according to the text detail analysis result to obtain an Online session resource desensitization reconstruction text includes:
acquiring an AI text reconstruction network;
performing text reconstruction on the Online session resource desensitization text by adopting the AI text reconstruction network to obtain an Online session resource desensitization reconstruction text;
the text reconstruction of the Online session resource desensitization text by adopting an AI text reconstruction network comprises the following steps:
obtaining a reconstructed text sample and setting an AI text reconstruction network;
performing disturbance adding operation on the reconstructed text sample to obtain a disturbance text sample;
and adopting the disturbance text sample to debug the set AI text reconstruction network to obtain the AI text reconstruction network.
The second aspect is a big data AI decision server comprising a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the computer instructions, when executed by the processor, cause the big data AI decision server to perform the method of the first aspect.
A third aspect is a software product for implementing a user session resource data protection method applying AI decisions, comprising a computer program/instruction, wherein the computer program/instruction, when executed, implements the method of performing the first aspect.
A fourth aspect is a computer readable storage medium having stored thereon a computer program which, when run, performs the method of the first aspect.
The embodiment of the invention can obtain the original Online conversation resource text, and reconstruct text details of the original Online conversation resource text to obtain the Online conversation resource reconstructed text; performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector; obtaining a general sensitive text processing network and a target sensitive data representation vector, wherein the general sensitive text processing network is used for adjusting a sensitive data representation form into a general sensitive data representation form; adopting a general sensitive text processing network and a target sensitive data representation vector to perform joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network, wherein the intermediate sensitive text processing network is used for further adjusting a sensitive data representation form according to big data protection conditions, and improving network variables of the general sensitive text processing network and network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network; obtaining a general sensitive data characterization vector and a to-be-anonymous Online session resource text, and performing characterization vector splicing operation on the general sensitive data characterization vector and a target sensitive data characterization vector to obtain a sensitive data splicing vector; and performing sensitive data anonymous protection on the to-be-anonymous Online session resource text by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain an Online session resource desensitization text meeting the big data protection condition, so that the accuracy and the rationality of the data anonymous desensitization protection can be ensured when the original Online session resource text is subjected to data anonymous desensitization protection.
Drawings
Fig. 1 is a flowchart of a user session resource data protection method applying AI decision according to an embodiment of the present invention.
Detailed Description
Hereinafter, the terms "first," "second," and "third," etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", or "a third", etc., may explicitly or implicitly include one or more such feature.
Fig. 1 shows a flow chart of a user session resource data protection method applying AI decision according to an embodiment of the present invention, where the user session resource data protection method applying AI decision may be implemented by a big data AI decision server, and the big data AI decision server may include a memory and a processor; the memory is coupled to the processor; the memory is used for storing computer program codes, and the computer program codes comprise computer instructions; wherein the processor, when executing the computer instructions, causes the big data AI decision server to perform steps 101-106.
And 101, obtaining an original Online session resource text, and carrying out text detail reconstruction on the original Online session resource text to obtain an Online session resource reconstructed text.
In the embodiment of the invention, the original Online session resource text comprises the Online session resource text which needs to be subjected to text detail optimization reconstruction. For example, the original Online session resource text may be an Online session resource text in which a phrase exists. For another example, the original Online session resource text may be an Online session resource text with wrongly written words.
Under some possible design ideas, various schemes can be adopted to reconstruct text details of the original Online conversation resource text. For example, text detail reconstruction is performed on the original Online conversation resource text based on the set AI text reconstruction network.
The set AI text reconstruction network can be built on the basis of a convolutional neural network, a Bayesian network and an activation function. The set AI text reconstruction network can optimize text details of the original Online conversation resource text, so that a better Online conversation resource reconstruction text is obtained.
And 102, performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector.
Under some possible design ideas, after the Online conversation resource reconstruction text is obtained, sensitive data vector mining (feature extraction) can be performed on the Online conversation resource reconstruction text, so as to obtain a target sensitive data characterization vector.
Further, the target sensitive data characterization vector includes text features for documenting a characterization form of the sensitive data.
Under some possible design ideas, various schemes can be adopted to perform sensitive data vector mining on the Online session resource reconstruction text, so as to obtain a target sensitive data characterization vector.
For example, various networks combined with an AI algorithm can be adopted to perform sensitive data vector mining on the Online session resource reconstruction text, so as to obtain a target sensitive data characterization vector.
And 103, obtaining a general sensitive text processing network and a target sensitive data representation vector, and adopting the general sensitive text processing network and the target sensitive data representation vector to perform joint debugging on the set sensitive text processing network to obtain an intermediate sensitive text processing network.
In the embodiment of the invention, the general sensitive text processing network comprises a network which can adjust the sensitive data representation form of the Online session resource text to the general sensitive data representation form. The generic sensitive text processing network can be understood as a basic sensitive text processing network, so that the generic sensitive data representation can be adaptively understood as a basic sensitive data representation, which can be understood as a generic desensitization protection output form designed for data desensitization protection. On the basis, the migration debugging of the set sensitive text processing network can be realized based on the general sensitive text processing network and the target sensitive data characterization vector, so that the intermediate sensitive text processing network is obtained, and the intermediate sensitive text processing network is the sensitive text processing network after the set sensitive text processing network completes the migration debugging.
Those skilled in the art will appreciate that the generic sensitive text processing network may be a convolutional neural network, a deep learning network, or the like.
The Online session resource text in the embodiment of the invention can comprise different types of Online session resource text. For example, the Online session resource text in the embodiment of the invention can be an Online session resource text generated in the meta-space service interaction process, an Online session resource text generated in the digital financial service interaction process, or an Online session resource text generated in the Online chat process.
The sensitive data representation form can comprise an anonymous desensitization output form of sensitive data information in the Online session resource text.
Under some possible design ideas, the sensitive data representation form can be flexibly designed, such as processing based on K anonymity or processing based on a content shielding mode.
Under some possible design considerations, the generic sensitive data representation may include sensitive data representations of a variety of Online session resource texts. For example, there are 5 sensitive data characterization forms, which are all different. For ease of description, these 5 sensitive data characterization forms may be considered as generic sensitive data characterization forms.
Under some possible design considerations, big data protection conditions may include sensitive data representation forms that the anonymous request system expects the Online session resource text to output. For example, the initial sensitive data representation form of the to-be-anonymous Online session resource text is the sensitive data representation form of the personal information shielding and the business interaction content shielding, but the sensitive data representation form of only the personal information for desensitizing protection can be a big data protection condition if the anonymous request system hopes to convert the to-be-anonymous Online session resource text into the sensitive data representation form for only the personal information for desensitizing protection.
Under some possible design considerations, the generic sensitive data characterization vector includes text features that can be used to document a characterization form of sensitive data.
In view of the ability to treat a variety of different sensitive data characterization forms as a generic sensitive data characterization form, the generic sensitive data characterization vector may also include features of the variety of different sensitive data characterization forms. For example, the generic sensitive data representation comprises 5 different sensitive data representations, and the generic sensitive data representation vector may comprise 5 different sensitive data representation vectors.
Under some possible design ideas, before the universal sensitive data representation vector is obtained, a sensitive data output processing network can be adopted to conduct text feature extraction on Online session resource texts in various different sensitive data representation forms, so that text features in various different sensitive data representation forms are obtained.
For example, a sensitive data output processing network may be used to refine text features of multiple Online session resource texts, so as to obtain multiple sensitive data characterization vectors.
Those skilled in the art will recognize that the sensitive data output processing network may be a neural network constructed according to actual requirements.
Under some possible design considerations, the target sensitive data characterization vector includes text features for documenting a characterization form of the sensitive data.
Under some possible design ideas, the output modes of the target sensitive data characterization vector and the general sensitive data characterization vector can be various. For example, the target sensitive data characterization vector and the generic sensitive data characterization vector may be in the form of a linear array, or the like.
In some possible embodiments, the untrained sensitive text processing network may be debugged prior to obtaining the generic sensitive text processing network, resulting in the generic sensitive text processing network. For example, the universal sensitive data characterization vector and the Online session resource debugging text can be used for debugging an untrained sensitive text processing network, so that the universal sensitive data characterization form adjustment is obtained.
In view of the similarity in performance of the intermediate sensitive text processing network to the general sensitive text processing network, the intermediate sensitive text processing network may be adapted to some specific sensitive data representation. Therefore, the universal sensitive text processing network and the target sensitive data characterization vector can be adopted to carry out joint debugging (such as migration debugging) on the setting sensitive text processing network so as to improve the timeliness of debugging the setting sensitive text processing network. Those skilled in the art will appreciate that joint debugging may be performed in combination with supervised training and unsupervised training.
Wherein the debug sample set includes sample resources for debugging the neural network. For example, when the set sensitive text processing network is debugged, the target sensitive data characterization vector is adopted to debug the set sensitive text processing network, so that the debug sample set is the target sensitive data characterization vector.
In the actual implementation process, the number of the Online session resource texts is usually not large, and if the Online session resource texts are directly adopted to debug the intermediate sensitive text processing network to be debugged, the performance of the intermediate sensitive text processing network is low. Therefore, the universal sensitive text processing network with better performance can be adopted to jointly debug the intermediate sensitive text processing network to be debugged.
Exemplary, the step of "using a general sensitive text processing network and the target sensitive data characterization vector to perform joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network" may include: adopting network variables of a general sensitive text processing network to roll back the network variables of the set sensitive text processing network to obtain a default sensitive text processing network; and debugging the default sensitive text processing network by adopting a target sensitive data characterization vector to obtain the intermediate sensitive text processing network.
Where the network variables may be understood as model parameters, the rollback process may be understood as a model parameter initialization process, and the set-up sensitive text processing network may include a neural network that has not been commissioned. For example, the setting sensitive text processing network may be a convolutional neural network model, but it is not possible to perform desensitization and anonymization protection on the Online session resource text based on big data protection conditions.
Under some possible design considerations, in order for the generic sensitive text processing network to perform joint debugging on the set-up sensitive text processing network, the network structure of the generic sensitive text processing network and the network structure of the set-up sensitive text processing network are generally the same.
For example, if the network structure of the general sensitive text processing network includes three layers of functional units, the network structure page setting the sensitive text processing network may include three layers of functional units.
Under some possible design ideas, when the network variables of the universal sensitive text processing network are adopted to carry out rollback processing on the network variables of the intermediate sensitive text processing network to be debugged, the network variables of the set sensitive text processing network can be set through the network variables of the universal sensitive text processing network, so that the default sensitive text processing network has the performance of universal sensitive data representation form adjustment.
For example, if the network variables of the three layers of functional units in the general sensitive text processing network are p1, p2 and p3, respectively, the network variables of the three layers of functional units in the sensitive text processing network may be p1, p2 and p3.
Under some possible design ideas, in order for the intermediate sensitive text processing network to adjust the sensitive data representation form of the Online session resource text to be a big data protection condition, the Online session resource text may be adopted to debug the default sensitive text processing network, thereby obtaining the intermediate sensitive text processing network.
Illustratively, the step of debugging the default sensitive text processing network using the target sensitive data characterization vector to obtain an intermediate sensitive text processing network may include: acquiring an Online session resource debugging text; adopting a target sensitive data characterization vector and the default sensitive text processing network to carry out sensitive data anonymous protection on the Online session resource debugging text, and obtaining a text anonymous protection prediction result; determining a text anonymous protection prediction result and debugging cost data of a template Online conversation resource text; network variables of the default sensitive text processing network are improved through the debugging cost data to obtain the universal sensitive text processing network.
The Online session resource debugging text can be understood as a training sample of the Online session resource text. Debug cost data may be understood as lost information or lost data.
Under some possible design ideas, the idea of debugging the default sensitive text processing network (the initialized sensitive text processing network obtained after the network variable rollback processing) can be a process of implementing sensitive data anonymity protection on Online session resource debugging texts by adopting continuous learning of the default sensitive text processing network. Exemplary, the step of "performing sensitive data anonymization protection on the Online session resource debugging text by using the target sensitive data characterization vector and the default sensitive text processing network to obtain a text anonymization protection prediction result" may include: adopting the default sensitive text processing network to conduct text feature extraction on the Online session resource debugging text to obtain a sensitive data characterization vector sample; performing sensitive element generalization operation on the sensitive data representation vector sample by adopting the target sensitive data representation vector to obtain a sensitive text generalization vector sample of the Online session resource debugging text; and generating a text anonymous protection prediction result of the Online session resource debugging text through the sensitive text generalization vector sample by adopting the default sensitive text processing network.
Under some possible design ideas, when text feature extraction is performed on the Online session resource debugging text by using a default sensitive text processing network, text feature extraction may be performed on the Online session resource debugging text by using a text feature extraction subnet (feature encoder), so as to obtain a sensitive data characterization vector sample.
Under some possible design ideas, when the default sensitive text processing network is adopted to refine text characteristics of the Online session resource debugging text, the default sensitive text processing network can be adopted to perform sensitive data vector mining on the Online session resource debugging text, so as to obtain text description data (which can be understood as characteristic information) of the Online session resource debugging text. And then, debugging text description data of the text through the Online session resource to obtain a sensitive data characterization vector sample. Exemplary, the step of "using a default sensitive text processing network to refine text features of an Online session resource debug text to obtain a sensitive data token vector sample" may include: adopting a default sensitive text processing network to perform sensitive data vector mining processing on the Online session resource debugging text to obtain text description data of the Online session resource debugging text; performing region projection operation on the text description data by adopting a default sensitive text processing network to obtain a text region positioning tag of the text description data; and generating a sensitive data characterization vector sample of the Online session resource debugging text through a text region positioning label by adopting a default sensitive text processing network.
For example, the region projection operation may be understood as a location mapping process, and text region location tags are used to reflect the location distribution characteristics of text description data. Further, the text description data of the Online session resource debug text includes information that can represent an Online session resource text feature of the Online session resource debug text.
Under some possible design ideas, the mining bias of the feature vectors also has differences when the sensitive data vector mining is carried out on the Online session resource debugging text through the difference of the Online session resource debugging text content.
Under some possible design ideas, various schemes can be adopted to perform sensitive data vector mining on Online session resource debugging text. For example, gradient units of the universal sensitive text processing network to be debugged can be adopted to carry out moving average processing on the Online session resource text, so that text description data of the Online session resource debugging text can be obtained. For another example, a moving average operator of the universal sensitive text processing network to be debugged can be adopted to carry out moving average processing on the Online session resource text, so that text description data of the Online session resource debugging text is obtained.
Under some possible design ideas, after text description data of the Online session resource debugging text is obtained, the text description data can be subjected to region projection operation to obtain a text region positioning tag of the text description data.
Under some possible design ideas, a default sensitive text processing network can be adopted, and a sensitive data characterization vector sample of the Online session resource debugging text can be generated through a text region positioning label.
For example, the text region localization tag may be adjusted to a sensitive data characterization vector instance using set intermediate features. The setting intermediate features comprise feature vectors configured in advance in a default sensitive text processing network, and the feature vectors can adjust text region positioning labels to be sensitive data characterization vector samples.
Under some possible design ideas, after the sensitive data representation vector of the Online session resource debugging text is obtained, a target sensitive data representation vector can be adopted, and sensitive element generalization operation is carried out on the sensitive data representation vector sample, so that a sensitive text generalization vector sample of the Online session resource debugging text is obtained. The target sensitive data representation vector is adopted, and various schemes can be adopted when sensitive element generalization operation is carried out on the Online session resource debugging text.
For example, the target sensitive data token vector and the sensitive data token vector sample may be summed to obtain a sensitive text generalization vector sample (text feature vector after further anonymization processing) of the Online session resource debug text. For another example, regularization processing can be performed on the target sensitive data characterization vector and the sensitive data characterization vector sample, so as to obtain a sensitive text generalization vector sample of the Online session resource debugging text.
For example, the evaluation index (mean+variance) of the sensitive data representation vector sample may be aligned to the evaluation index (mean+variance) of the general sensitive data representation vector, so as to obtain the sensitive text generalization vector sample.
Under some possible design ideas, after the sensitive text generalization vector examples are obtained, a default sensitive text processing network can be adopted, and a text anonymous protection prediction result of the Online session resource debugging text can be generated through the sensitive text generalization vector examples. Illustratively, the step of "generating a text anonymous protection prediction result of the Online session resource debug text by using the default sensitive text processing network and the sensitive text generalization vector sample" may include: performing text recovery operation (text feature decoding processing) on the sensitive text generalization vector sample by adopting the default sensitive text processing network to obtain an Online session resource recovery text (feature decoding text); performing content discrimination operation on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text; and adopting the content discrimination result to carry out content significance adjustment on the Online session resource recovery text to obtain the text anonymous protection prediction result. Wherein the content discriminating operation can be understood as a semantic splitting process.
Under some possible design ideas, when the sensitive text processing network is set to be the generation countermeasure network, a feature decoding unit in the generation countermeasure network can be adopted to perform text recovery operation on the sensitive text generalization vector sample, so as to obtain an Online session resource recovery text.
The Online session resource recovery text comprises Online session resource text with the target sensitive data characterization vector. However, because of the difference of the contents of the debugging texts of different Online session resources, vector reinforcement can be performed on the Online session resource recovery text through the contents of the Online session resource debugging texts, so that the obtained text anonymous protection prediction result is as complete and accurate as possible.
Under some possible design ideas, content discrimination operation can be performed on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text. And then, adopting a content discrimination result to carry out content significance adjustment on the Online session resource recovery text, and obtaining a text anonymous protection prediction result.
Under some possible design ideas, when the content discrimination result is adopted to perform content saliency adjustment (feature strengthening treatment) on the Online session resource recovery text, the content discrimination result and text description data of the Online session resource recovery text can be overlapped, so that the content saliency adjustment on the Online session resource recovery text is realized.
Under some possible design ideas, after the text anonymous protection prediction result is obtained, the text anonymous protection prediction result and the debugging cost data of the template Online conversation resource text can be determined, so that network variables for setting the sensitive text processing network can be adjusted through the debugging cost data, and an intermediate sensitive text processing network can be obtained.
The debugging cost data comprises the similarity degree of sensitive data representation forms between the anonymous protection prediction result of the judging text and the template Online conversation resource text. For example, the debugging cost data can be a variable value, and when the variable value is smaller, the higher the similarity degree of the sensitive data representation form between the text anonymous protection prediction result and the template Online conversation resource text is indicated, the better the running quality of the network is. Conversely, when the variable value is larger, the lower the similarity of the sensitive data representation form between the text anonymous protection prediction result and the template Online session resource text is indicated, and the poorer the running quality of the network is.
Under some possible design considerations, a cost function (such as a cross entropy cost function) may be used to determine the text anonymously protecting the prediction result and debug cost data for the template Online session resource text.
Under some possible design considerations, when the sensitive text processing network is set to be a generating countermeasure network, a decision subnet in the generating countermeasure network can be used to determine a text anonymous protection prediction result and debug cost data of a template Online session resource text.
After obtaining the debugging cost data under some possible design ideas, the network variables of the set sensitive text processing network can be improved by the debugging cost data, so that the intermediate sensitive text processing network is obtained.
For example, when the debugging cost data is large, network variables of the set sensitive text processing network can be adjusted. And then debugging the final sensitive text processing network to see whether the debugging cost data is improved. And (3) circularly debugging based on the thought until the debugging cost data meets the requirement, and determining the current final sensitive text processing network as an intermediate sensitive text processing network.
Under some possible design ideas, when the untrained sensitive text processing network is debugged, the richness and the volume of the debug sample set of the untrained sensitive text processing network can be ensured because the general sensitive data representation form can comprise the sensitive data representation forms of various Online session resource texts, and therefore, the general sensitive data representation vector can comprise the sensitive data representation vectors of various Online session resource texts. Therefore, the universal sensitive data characterization vector can be adopted to debug the untrained sensitive text processing network, so that the universal sensitive text processing network is obtained. The process of debugging the untrained sensitive text processing network by adopting the general sensitive data representation form Online session resource text can refer to the process of debugging the default sensitive text processing network.
In the embodiment of the invention, the universal sensitive text processing network to be debugged is debugged, so that the universal sensitive text processing network can grasp the characteristics of various sensitive data representation forms and has universal performance of adjusting the sensitive data representation forms of the Online session resource texts. And then, the intermediate sensitive text processing network to be debugged is jointly debugged by adopting the universal sensitive data characterization vector, so that the timeliness of network debugging is improved.
In the process of debugging the network, text strengthening is carried out on the Online session resource text by adopting the content discrimination result of the Online session resource debugging text, so that the running quality of the network can be further improved, the Online session resource text subjected to sensitive data representation form adjustment through the network is matched with big data protection conditions, and personalized and targeted data anonymous protection is realized.
And 104, combining the network variables of the general sensitive text processing network, and performing improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network.
Under some possible design ideas, in order to further improve the performance of the intermediate sensitive text processing network, the intermediate sensitive text processing network can be adjusted by adopting a universal sensitive text processing network with better usability, so as to obtain a final sensitive text processing network. The final sensitive text processing network is better in quality, and the Online session resource text subjected to the final sensitive text processing network for the adjustment of the sensitive data representation form can be more close to the big data protection condition.
Under some possible design ideas, the step of "combining network variables of the general-purpose sensitive text processing network to perform an improvement operation on network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network" may include: extracting at least one network element to be improved from the intermediate sensitive text processing network; extracting a corresponding improved auxiliary unit from the universal sensitive text processing network through the to-be-improved network unit; and carrying out improvement operation on the element configuration variables of the to-be-improved network element by combining the element configuration variables of the improved auxiliary element to obtain the final sensitive text processing network.
Those skilled in the art will appreciate that the functional units/network elements are part of a neural network, each functional unit having a different function.
Under some possible design considerations, when network variables of the generic sensitive text processing network are modified with network variables of the intermediate sensitive text processing network, at least one network element to be modified may first be extracted from the intermediate sensitive text processing network. Wherein the network element to be improved comprises functional units whose performance is to be improved. For example, when the performance of a sensitive data vector mining layer (mining function unit) in the intermediate sensitive text processing network is poor, the sensitive data vector mining layer can be determined as a network element to be improved and extracted. For another example, when the performance of the sensitive data vector mining layer and the downsampling layer in the intermediate sensitive text processing network is poor, both the sensitive data vector mining layer and the downsampling layer may be extracted and determined to be a network element to be improved.
Under some possible design ideas, the corresponding improved auxiliary units can be extracted from the general sensitive text processing network through the to-be-improved network units. Wherein the improvement assisting unit comprises a functional unit which serves as a reference when adjusting the network element to be improved.
For example, when the performance of the sensitive data vector mining layer and the downsampling layer in the middle sensitive text processing network is poor, the sensitive data vector mining layer and the downsampling layer in the general sensitive text processing network can be extracted accordingly, and the sensitive data vector mining layer and the downsampling layer in the general sensitive text processing network are determined to be improved auxiliary units.
Under some possible design ideas, after the network element to be improved and the improvement auxiliary element are extracted, the element configuration variable of the improvement auxiliary element can be improved on the element configuration variable of the network element to be improved, so as to obtain the final sensitive text processing network. When the unit configuration variables of the improving auxiliary unit are adopted to improve the unit configuration variables of the to-be-improved network unit, the unit configuration variables of the improving auxiliary unit and the unit configuration variables of the to-be-improved network unit can be fused, so that the final sensitive text processing network is obtained. Illustratively, the step of performing an improvement operation on the element configuration variables of the network element to be improved using the element configuration variables of the improvement auxiliary element to obtain a final sensitive text processing network may include: determining a unit configuration variable weighting factor of the network element to be improved and a unit configuration variable weighting factor of the auxiliary unit to be improved; and carrying out parameter vector aggregation on the unit configuration variables of the network element to be improved and the unit configuration variables of the auxiliary unit to be improved through the unit configuration variable weighting factors of the network element to be improved and the unit configuration variable weighting factors of the auxiliary unit to be improved, so as to obtain the final sensitive text processing network.
For example, the element configuration variable of the network element to be improved is in1, and the element configuration variable of the improvement assisting element is in2. Wherein the unit configuration variable weighting factor of the network unit to be improved is x1, and the unit configuration variable weighting factor of the auxiliary unit to be improved is y1. When the element configuration variables of the network element to be improved and the element configuration variables of the auxiliary element to be improved are vector-polymerized, the latest element configuration variable in=in1×1+in2×y1 can be obtained. In this way, the configuration variables of the units of the intermediate sensitive text processing network can be changed, so that the final sensitive text processing network can not only adjust the sensitive data representation form of the Online session resource text to be a big data protection condition, but also improve the performance of adjusting the sensitive data representation form of the Online session resource text through the general sensitive text processing network.
And 105, obtaining a general sensitive data characterization vector and an Online session resource text to be anonymized, and performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
Under some possible design considerations, the Online session resource text to be anonymous may include Online session resource text that makes sensitive data characterization form adjustments. The embodiment of the invention is not limited to the representation form and the content of the sensitive data of the resource text of the Online conversation to be anonymized.
Wherein, since various different sensitive data characterization forms can be regarded as the general sensitive data characterization form, the general sensitive data characterization vector can also include features of the various different sensitive data characterization forms. For example, the generic sensitive data representation comprises 5 different sensitive data representations, and the generic sensitive data representation vector may comprise 5 different sensitive data representation vectors.
Under some possible design ideas, the general sensitive data characterization vector and the target sensitive data characterization vector can be subjected to characterization vector splicing operation to obtain a sensitive data splicing vector.
The general sensitive data characterization vector may include a plurality of sensitive data characterization vectors, so that the general sensitive data characterization vector may be integrated with the target sensitive data characterization vector after the summarization operation. Exemplary, the step of performing the token vector splicing operation on the general sensitive data token vector and the target sensitive data token vector to obtain a sensitive data splicing vector may include: summarizing the general sensitive data characterization vector to obtain a summarized sensitive data characterization vector; and vector aggregation is carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector, so that a sensitive data splicing vector is obtained.
Various schemes can be adopted to summarize the general sensitive data characterization vectors. For example, a plurality of sensitive data characterization vectors may be averaged to obtain a summarized sensitive data characterization vector. For another example, the variances of a plurality of sensitive data characterization vectors may be determined, resulting in summarized sensitive data characterization vectors.
After the summarized sensitive data characterization vector is obtained, vector aggregation can be carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
For example, the summarized sensitive data characterization vector and the target sensitive data characterization vector may be summed to obtain the sensitive data splice vector.
Under some possible design ideas, the steps of performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector and performing improvement operation on network variables of the intermediate sensitive text processing network by adopting network variables of the general sensitive text processing network to obtain a final sensitive text processing network are not limited in implementation sequence. For example, the step of performing the token vector splicing operation on the general sensitive data token vector and the target sensitive data token vector to obtain the sensitive data splicing vector may be performed first, or the step of performing the improvement operation on the network variable of the intermediate sensitive text processing network by using the network variable of the general sensitive text processing network may be performed first to obtain the final sensitive text processing network. For another example, it may also be implemented synchronously.
And 106, performing sensitive data anonymity protection on the Online conversation resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain the Online conversation resource desensitized text meeting the big data protection condition.
Under some possible design ideas, a final sensitive text processing network and a sensitive data splicing vector can be adopted to carry out sensitive data anonymous protection on the Online conversation resource text to be anonymous, so that the Online conversation resource desensitization text meeting the big data protection condition is obtained. Exemplary, the step of anonymously protecting sensitive data of the Online session resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain an Online session resource desensitized text meeting a big data protection condition may include: adopting a final sensitive text processing network to refine text characteristics of the Online conversation resource text to be anonymous to obtain a sensitive text vector to be anonymous of the Online conversation resource text to be anonymous; performing sensitive element generalization operation on the sensitive text vector to be anonymized by adopting the sensitive data splicing vector to obtain a sensitive text generalization vector; and performing text recovery operation on the sensitive text generalization vector by adopting a final sensitive text processing network to obtain the Online session resource desensitization text meeting the big data protection condition.
Under some possible design ideas, when the final sensitive text processing network is a generated countermeasure network, a text feature extraction subnet (feature encoder) in the generated countermeasure network can be adopted to extract text features of the to-be-anonymous Online conversation resource text, so as to obtain a to-be-anonymous Online conversation resource text sensitive data representation form of the to-be-anonymous Online conversation resource text.
Under some possible design ideas, a final sensitive text processing network can be adopted to perform sensitive data vector mining processing on the Online conversation resource text to be anonymized, so as to obtain text description data of the Online conversation resource text to be anonymized. And then, obtaining the sensitive text vector to be anonymous through text description data of the Online conversation resource text to be anonymous. Exemplary, the step of performing text feature extraction on the Online conversation resource text to be anonymized by adopting the final sensitive text processing network to obtain a sensitive text vector to be anonymized of the Online conversation resource text to be anonymized may include: adopting a final sensitive text processing network to perform sensitive data vector mining processing on the to-be-anonymous Online session resource text to obtain text description data of the to-be-anonymous Online session resource text; performing region projection operation on the text description data by adopting a final sensitive text processing network to obtain a text region positioning tag of the text description data; and generating a sensitive text vector to be anonymous of the Online conversation resource text to be anonymous through the text region positioning tag by adopting a final sensitive text processing network.
The sensitive data vector mining can be performed on the to-be-anonymous Online session resource text by adopting various schemes. For example, a gradient unit of a final sensitive text processing network can be adopted to carry out moving average processing on the to-be-anonymous Online session resource text, so as to obtain text description data of the to-be-anonymous Online session resource text. For another example, a moving average operator of the final sensitive text processing network may be used to perform a moving average processing on the to-be-anonymous Online session resource text, so as to obtain text description data of the to-be-anonymous Online session resource text. When the text description data is subjected to the region projection operation, the preset distribution condition can be adopted to carry out the region projection operation on the text description data. When the sensitive text vector to be anonymized of the Online conversation resource text to be anonymized is generated according to the text region locating label, the text region locating label can be adjusted to the sensitive text vector to be anonymized by setting intermediate features.
Under some possible design ideas, sensitive element generalization operation (feature generalization processing) is carried out on a sensitive text vector to be anonymized by adopting a sensitive data splicing vector through various schemes to obtain a sensitive text generalization vector.
For example, the sensitive data stitching vector and the sensitive text vector to be anonymized can be integrated, so that a sensitive text generalization vector is obtained.
Under some possible design ideas, when the final sensitive text processing network is a generated countermeasure network, a feature decoding unit in the generated countermeasure network can be used for extracting text features of the Online conversation resource text to be anonymous, so as to obtain the Online conversation resource desensitization text meeting the big data protection condition.
Under some possible design ideas, after the Online conversation resource desensitization text is obtained, the quality degree of the Online conversation resource desensitization text can be analyzed, and when the quality degree of the Online conversation resource desensitization text is poor, the quality degree of the Online conversation resource desensitization text can be improved. Illustratively, the method may further comprise: text detail analysis is carried out on the Online session resource desensitization text to obtain a text detail analysis result of the Online session resource desensitization text; and carrying out text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain the Online session resource desensitization reconstruction text.
The text detail analysis result of the Online conversation resource desensitization text comprises information which can represent the quality of the Online conversation resource text. For example, the text detail parsing result may include information such as word accuracy of the Online conversation resource desensitization text, online conversation resource text size, and the like.
Under some possible design ideas, text detail reconstruction can be carried out on the Online conversation resource desensitization text through text detail analysis results, so that the reconstructed Online conversation resource desensitization text is obtained, and the detail quality of the Online conversation resource desensitization text is improved.
Under some possible design ideas, an AI text reconstruction network can be used for text reconstruction of the Online session resource desensitized text. The AI text reconstruction network may be a deep learning model, or may be another type of neural network model. The network layer structure of the AI text reconstruction network can be flexibly adjusted according to actual requirements by a person skilled in the art.
Under some possible design ideas, before text reconstruction is performed on the Online conversation resource desensitized text by adopting the AI text reconstruction network, a set AI text reconstruction network can be obtained, and the set AI text reconstruction network is debugged, so that the AI text reconstruction network is obtained. Wherein, the step of reconstructing the network for the set AI text may include: obtaining a reconstructed text sample and an AI text reconstruction network to be debugged; performing disturbance adding operation on the reconstructed text sample to obtain a disturbance text sample; and debugging the set AI text reconstruction network by adopting the disturbance text sample to obtain the AI text reconstruction network.
The reconstructed text sample can comprise optimized Online session resource text in any sensitive data representation form. The AI text reconstruction network can be set up based on convolutional neural network, bayesian network and activation function.
Under some possible design ideas, because of the lack of the reconstructed text sample, the reconstructed text sample may be subjected to a disturbance adding operation (noise adding process) to obtain a disturbance text sample. And then, debugging the AI text reconstruction network to be debugged by adopting the disturbance text sample, thereby obtaining the AI text reconstruction network. Wherein the perturbation addition operation includes a process of actively reducing the quality level of the reconstructed text sample. There are various ways of disturbing the adding operation. For example, the word or sentence may be added by mistake.
After the disturbance text sample is obtained, the AI text reconstruction network to be debugged can be debugged by adopting the disturbance text sample, so that the AI text reconstruction network is obtained.
Under some possible design ideas, after the AI text reconstruction network is obtained, the AI text reconstruction network can be adopted to reconstruct text details of the Online conversation resource desensitized text. For example, if the Online conversation resource desensitization text has the problem of lower word accuracy, an AI text reconstruction network can be adopted to reconstruct text details of the Online conversation resource desensitization text, so that the quality degree of the Online conversation resource desensitization text is improved.
The embodiment of the invention provides a user session resource data protection method applying an AI decision, which comprises the following steps: the method comprises the steps of obtaining a general sensitive text processing network, an intermediate sensitive text processing network, a general sensitive data characterization vector, a target sensitive data characterization vector and a to-be-anonymous Online session resource text, wherein the general sensitive text processing network is used for adjusting a sensitive data characterization form into a general sensitive data characterization form, and the intermediate sensitive text processing network is used for further adjusting the sensitive data characterization form according to big data protection conditions; performing improvement operation on network variables of the intermediate sensitive text processing network by adopting network variables of the universal sensitive text processing network to obtain a final sensitive text processing network; performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector; and carrying out sensitive data anonymization protection on the Online conversation resource text to be anonymized by adopting a final sensitive text processing network and a sensitive data splicing vector to obtain the Online conversation resource desensitization text meeting the big data protection condition. The network variables of the middle sensitive text processing network are improved by adopting the network variables of the general sensitive text processing network, so that the Online conversation resource desensitization text generated by the final sensitive text processing network is more matched with big data protection conditions, and the accuracy and rationality of the data anonymous desensitization protection can be ensured when the data anonymous desensitization protection is carried out on the original Online conversation resource text.
In addition, sensitive data stitching vectors are also employed in generating Online session resource desensitized text. Because the sensitive data splicing vector is obtained by vector aggregation of the general sensitive data characterization vector and the target sensitive data characterization vector, the sensitive data splicing vector can be compatible with different requirements of data anonymous protection, and the precision and the reliability of adjusting the sensitive data characterization form of the Online session resource text can be further improved.
In addition, the embodiment of the invention can reconstruct details of the Online session resource desensitization text, thereby improving the quality degree of the Online session resource desensitization text.
In the embodiment of the invention, the big data AI decision server can obtain a general sensitive text processing network, an intermediate sensitive text processing network, a general sensitive data characterization vector, a target sensitive data characterization vector and an Online session resource text to be anonymized; the large data AI decision server adopts the network variables of the general sensitive text processing network to carry out improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network; the big data AI decision server performs characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector; the big data AI decision server adopts a final sensitive text processing network and sensitive data splicing vectors to carry out sensitive data anonymous protection on the to-be-anonymous Online session resource text to obtain the Online session resource desensitization text meeting the big data protection condition, so that the accuracy and rationality of the data anonymous desensitization protection can be ensured when the data anonymous desensitization protection is carried out on the original Online session resource text.
Based on the foregoing, in some independent embodiments, after performing sensitive data anonymity protection on the Online session resource text to be anonymized by using the final sensitive text processing network and the sensitive data stitching vector to obtain an Online session resource desensitized text meeting a big data protection condition, the method further includes: responding to a session resource pushing request sent by a pushing platform system, and determining a question-answer preference label of an online session client pointed by the session resource pushing request; and pushing the Online session resource desensitization text to the Online session client when the question and answer preference tag is matched with the Online session resource desensitization text.
Therefore, before the Online session resource desensitization text is pushed, the accuracy of data pushing can be guaranteed through the matching processing of the question-answer preference labels, and the resource waste caused by pushing deviation is reduced.
Based on the foregoing, in some independent embodiments, the determining, in response to a session resource push request sent by a push platform system, a question-answer preference tag of an online session client to which the session resource push request points includes steps 201-206.
Step 201, acquiring an online question-answer information set of the online session client in response to the question-answer record of the online session client, wherein the online question-answer information set comprises W groups of online question-answer information with time sequence, and W is an integer greater than or equal to 1.
Step 202, acquiring an additional inquiry response information set according to the online inquiry response information set, wherein the additional inquiry response information set comprises W groups of additional inquiry response information with time sequence.
Step 203, acquiring an online question-answer interaction description set through a first dialogue identification component included in an online dialogue analysis algorithm based on the online question-answer information set, wherein the online question-answer interaction description set comprises W online question-answer interaction descriptions.
And 204, acquiring a challenge response interaction description set through a second dialogue recognition component included in the online dialogue analysis algorithm based on the challenge response information set, wherein the challenge response interaction description set comprises W challenge response interaction descriptions.
Step 205, based on the online question-answer interaction description set and the additional question-answer interaction description set, obtaining a preference analysis weight corresponding to the online question-answer information set through a preference analysis component included in the online dialogue analysis algorithm.
And 206, determining the question-answer preference labels of the online question-answer information set according to the preference analysis weights.
Therefore, the user question and answer requirements characterized by the online question and answer interaction description and the additional question and answer interaction description can be fully considered in the process of outputting the preference analysis weight by combining the online question and answer information and the additional question and answer information to determine the question and answer preference label, so that the reliability of the preference analysis weight can be ensured, and the determination accuracy of the question and answer preference label is further improved.
Based on the foregoing, in some independent embodiments, the obtaining, by a preference analysis component included in the online dialog analysis algorithm, a preference analysis weight corresponding to the online question-answer information set based on the online question-answer interaction description set and the additional question-answer interaction description set includes: based on the online question-answer interaction description set, W first description vectors are obtained through a first scene attention module included in the online dialogue analysis algorithm, wherein each first description vector corresponds to one online question-answer interaction description; based on the interactive description set of the challenge and response, acquiring W second description vectors through a second scene attention module included in the online dialogue analysis algorithm, wherein each second description vector corresponds to one interactive description of the challenge and response; performing stitching processing on the W first description vectors and the W second description vectors to obtain W target description vectors, wherein each target description vector comprises a first description vector and a second description vector; and based on the W target description vectors, acquiring the preference analysis weight corresponding to the online question-answer information set through the preference analysis component included in the online dialogue analysis algorithm.
The embodiment of the invention also provides a software product for realizing the user session resource data protection method applying the AI decision, which comprises a computer program/instruction, wherein the method is realized to be executed when the computer program/instruction is executed.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when run performs the above method.
In summary, a user session resource data protection method and a software product applying AI decision are provided, and the method and the software product combine large data of resource text to carry out joint debugging on the AI neural network so as to ensure the performance of the AI neural network. In the application stage of the AI neural network, the Online session resource text is subjected to the large data anonymous desensitization protection by combining the large data anonymous desensitization technology, so that the accuracy and the rationality of the large data anonymous desensitization protection can be improved. In addition, because the Online session resource text can relate to the fields of metauniverse, digital service and the like, the method and the software product have high reusability and high expandability
The foregoing is only a specific embodiment of the present invention. Variations and alternatives will occur to those skilled in the art based on the detailed description provided herein and are intended to be included within the scope of the invention.

Claims (10)

1. A user session resource data protection method applying AI decision, characterized in that it is applied to big data AI decision server, the method comprising:
obtaining an original Online conversation resource text, and carrying out text detail reconstruction on the original Online conversation resource text to obtain an Online conversation resource reconstructed text;
performing sensitive data vector mining on the Online session resource reconfiguration text to obtain a target sensitive data characterization vector;
obtaining a general sensitive text processing network, and adopting the general sensitive text processing network and the target sensitive data characterization vector to carry out joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network;
combining the network variables of the general sensitive text processing network, and performing improvement operation on the network variables of the intermediate sensitive text processing network to obtain a final sensitive text processing network;
obtaining a general sensitive data characterization vector and a to-be-anonymous Online session resource text, and performing characterization vector splicing operation on the general sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector;
and carrying out sensitive data anonymous protection on the to-be-anonymous Online session resource text by adopting the final sensitive text processing network and the sensitive data splicing vector to obtain the Online session resource desensitization text meeting the big data protection condition.
2. The method for protecting user session resource data by applying AI decision according to claim 1, wherein said performing sensitive data anonymization protection on the to-be-anonymized Online session resource text by using the final sensitive text processing network and the sensitive data splicing vector to obtain an Online session resource desensitized text satisfying a big data protection condition comprises:
extracting text features of the to-be-anonymous Online conversation resource text by adopting the final sensitive text processing network to obtain a to-be-anonymous sensitive text vector of the to-be-anonymous Online conversation resource text;
performing sensitive element generalization operation on the sensitive text vector to be anonymized by adopting the sensitive data splicing vector to obtain a sensitive text generalization vector;
and performing text recovery operation on the sensitive text generalization vector by adopting the final sensitive text processing network to obtain the Online session resource desensitization text meeting the big data protection condition.
3. The method for protecting user session resource data by applying AI decision according to claim 2, wherein the performing text feature extraction on the to-be-anonymous Online session resource text by using the final sensitive text processing network to obtain the to-be-anonymous sensitive text vector of the to-be-anonymous Online session resource text includes:
Adopting the final sensitive text processing network to perform sensitive data vector mining processing on the to-be-anonymous Online session resource text to obtain text description data of the to-be-anonymous Online session resource text;
performing region projection operation on the text description data by adopting the final sensitive text processing network to obtain a text region positioning tag of the text description data;
and generating a sensitive text vector to be anonymous of the to-be-anonymous Online conversation resource text through the text region positioning tag by adopting the final sensitive text processing network.
4. The method for protecting user session resource data by applying AI decision according to claim 1, wherein performing a token vector concatenation operation on the generic sensitive data token vector and the target sensitive data token vector to obtain a sensitive data concatenation vector comprises:
summarizing the general sensitive data characterization vector to obtain a summarized sensitive data characterization vector;
and vector aggregation is carried out on the summarized sensitive data characterization vector and the target sensitive data characterization vector to obtain a sensitive data splicing vector.
5. The method for protecting user session resource data by applying AI decision according to claim 1, wherein said combining network variables of said generic sensitive text processing network to improve network variables of said intermediate sensitive text processing network to obtain a final sensitive text processing network comprises:
Extracting at least one network element to be improved from the intermediate sensitive text processing network;
extracting corresponding improved auxiliary units from the universal sensitive text processing network through the network unit to be improved;
and carrying out improvement operation on the element configuration variables of the element to be improved by combining the element configuration variables of the improvement auxiliary unit to obtain the final sensitive text processing network.
6. The AI decision-applying user session resource data protection method of claim 5, wherein said refining the element configuration variables of the network element to be refined in combination with the element configuration variables of the refined auxiliary element to obtain the final sensitive text processing network comprises:
determining a unit configuration variable weighting factor for the network element to be modified and a unit configuration variable weighting factor for the modified auxiliary unit;
and vector aggregation is carried out on the unit configuration variables of the to-be-improved network unit and the unit configuration variables of the improved auxiliary unit through the unit configuration variable weighting factors of the to-be-improved network unit and the unit configuration variable weighting factors of the improved auxiliary unit, so that the final sensitive text processing network is obtained.
7. The method for protecting user session resource data by applying AI decision according to claim 1, wherein the adopting the general sensitive text processing network and the target sensitive data characterization vector to perform joint debugging on a set sensitive text processing network to obtain an intermediate sensitive text processing network comprises:
adopting the network variables of the general sensitive text processing network to roll back the network variables of the set sensitive text processing network to obtain a default sensitive text processing network;
debugging the default sensitive text processing network by adopting the target sensitive data characterization vector to obtain the intermediate sensitive text processing network;
the debugging the default sensitive text processing network by adopting the target sensitive data characterization vector to obtain the intermediate sensitive text processing network comprises the following steps:
acquiring an Online session resource debugging text;
adopting the target sensitive data characterization vector and the default sensitive text processing network to carry out sensitive data anonymous protection on the Online session resource debugging text to obtain a text anonymous protection prediction result;
determining the text anonymous protection prediction result and debugging cost data of a template Online conversation resource text;
Improving network variables of the default sensitive text processing network through the debugging cost data to obtain the universal sensitive text processing network;
the method for anonymously protecting the sensitive data of the Online session resource debugging text by adopting the target sensitive data characterization vector and the default sensitive text processing network to obtain a text anonymously protecting prediction result comprises the following steps:
adopting the default sensitive text processing network to conduct text feature extraction on the Online session resource debugging text to obtain a sensitive data characterization vector sample;
performing sensitive element generalization operation on the sensitive data representation vector sample by adopting the target sensitive data representation vector to obtain a sensitive text generalization vector sample of the Online session resource debugging text;
generating a text anonymous protection prediction result of the Online session resource debugging text through the sensitive text generalization vector sample by adopting the default sensitive text processing network;
the generating, by using the default sensitive text processing network and the sensitive text generalization vector sample, a text anonymous protection prediction result of the Online session resource debugging text includes:
Performing text recovery operation on the sensitive text generalization vector sample by adopting the default sensitive text processing network to obtain an Online session resource recovery text;
performing content discrimination operation on the Online session resource debugging text to obtain a content discrimination result of the Online session resource debugging text;
and adopting the content discrimination result to carry out content significance adjustment on the Online session resource recovery text to obtain the text anonymous protection prediction result.
8. The method for protecting user session resource data applying AI decisions of claim 1, wherein the method further comprises:
performing text detail analysis on the Online session resource desensitization text to obtain a text detail analysis result of the Online session resource desensitization text;
carrying out text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain an Online session resource desensitization reconstruction text;
and performing text detail reconstruction on the Online session resource desensitization text according to the text detail analysis result to obtain the Online session resource desensitization reconstruction text, wherein the text detail reconstruction comprises the following steps:
acquiring an AI text reconstruction network;
Performing text reconstruction on the Online session resource desensitization text by adopting the AI text reconstruction network to obtain an Online session resource desensitization reconstruction text;
the text reconstruction of the Online session resource desensitization text by adopting an AI text reconstruction network comprises the following steps:
obtaining a reconstructed text sample and setting an AI text reconstruction network;
performing disturbance adding operation on the reconstructed text sample to obtain a disturbance text sample;
and adopting the disturbance text sample to debug the set AI text reconstruction network to obtain the AI text reconstruction network.
9. A software product for implementing a user session resource data protection method applying AI decisions, characterized in that it comprises a computer program/instruction, wherein said computer program/instruction, when executed, implements a method of performing one or more of claims 1-8.
10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when run, is a method according to one or more of claims 1-8.
CN202310371962.7A 2023-04-10 2023-04-10 User session resource data protection method and software product applying AI decision Active CN116361858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310371962.7A CN116361858B (en) 2023-04-10 2023-04-10 User session resource data protection method and software product applying AI decision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310371962.7A CN116361858B (en) 2023-04-10 2023-04-10 User session resource data protection method and software product applying AI decision

Publications (2)

Publication Number Publication Date
CN116361858A true CN116361858A (en) 2023-06-30
CN116361858B CN116361858B (en) 2024-01-26

Family

ID=86938333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310371962.7A Active CN116361858B (en) 2023-04-10 2023-04-10 User session resource data protection method and software product applying AI decision

Country Status (1)

Country Link
CN (1) CN116361858B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858280A (en) * 2019-01-21 2019-06-07 深圳昂楷科技有限公司 A kind of desensitization method based on machine learning, device and desensitization equipment
CN111639477A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Text reconstruction training method and system
CN111680497A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Session feature extraction method, session recognition model training method and device
CN112434331A (en) * 2020-11-20 2021-03-02 百度在线网络技术(北京)有限公司 Data desensitization method, device, equipment and storage medium
CN113886885A (en) * 2021-10-21 2022-01-04 平安科技(深圳)有限公司 Data desensitization method, data desensitization device, equipment and storage medium
CN114398665A (en) * 2021-12-14 2022-04-26 杭萧钢构股份有限公司 Data desensitization method, device, storage medium and terminal
US20220148113A1 (en) * 2020-11-09 2022-05-12 Adobe Inc. Machine learning modeling for protection against online disclosure of sensitive data
CN114598671A (en) * 2022-03-21 2022-06-07 北京明略昭辉科技有限公司 Session message processing method, device, storage medium and electronic equipment
EP4016355A2 (en) * 2022-03-25 2022-06-22 i2x GmbH Anonymized sensitive data analysis
CN115664785A (en) * 2022-10-21 2023-01-31 重庆智能工程职业学院 Big data platform data desensitization system
CN115712703A (en) * 2022-12-26 2023-02-24 合肥随铥互联网科技有限公司 Decision analysis method and server applied to big data anonymous processing

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858280A (en) * 2019-01-21 2019-06-07 深圳昂楷科技有限公司 A kind of desensitization method based on machine learning, device and desensitization equipment
CN111680497A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Session feature extraction method, session recognition model training method and device
CN111639477A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Text reconstruction training method and system
US20220148113A1 (en) * 2020-11-09 2022-05-12 Adobe Inc. Machine learning modeling for protection against online disclosure of sensitive data
CN112434331A (en) * 2020-11-20 2021-03-02 百度在线网络技术(北京)有限公司 Data desensitization method, device, equipment and storage medium
CN113886885A (en) * 2021-10-21 2022-01-04 平安科技(深圳)有限公司 Data desensitization method, data desensitization device, equipment and storage medium
CN114398665A (en) * 2021-12-14 2022-04-26 杭萧钢构股份有限公司 Data desensitization method, device, storage medium and terminal
CN114598671A (en) * 2022-03-21 2022-06-07 北京明略昭辉科技有限公司 Session message processing method, device, storage medium and electronic equipment
EP4016355A2 (en) * 2022-03-25 2022-06-22 i2x GmbH Anonymized sensitive data analysis
CN115664785A (en) * 2022-10-21 2023-01-31 重庆智能工程职业学院 Big data platform data desensitization system
CN115712703A (en) * 2022-12-26 2023-02-24 合肥随铥互联网科技有限公司 Decision analysis method and server applied to big data anonymous processing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
B. RAVI PRASAD 等: "Protection of privacy in big data using SDD framework with DNN", 2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT) *
曾义夫;牟其林;周乐;蓝天;刘峤;: "基于图表示学习的会话感知推荐模型", 计算机研究与发展, no. 03 *
董子娴: "动态数据脱敏技术的研究", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 2022, pages 138 - 188 *

Also Published As

Publication number Publication date
CN116361858B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN111723209B (en) Semi-supervised text classification model training method, text classification method, system, equipment and medium
CN113591902B (en) Cross-modal understanding and generating method and device based on multi-modal pre-training model
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
Yu et al. Visual relationship detection with internal and external linguistic knowledge distillation
US10929383B2 (en) Method and system for improving training data understanding in natural language processing
JP2023541649A (en) Semantic learning in federated learning systems
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
US20200401910A1 (en) Intelligent causal knowledge extraction from data sources
US10937417B2 (en) Systems and methods for automatically categorizing unstructured data and improving a machine learning-based dialogue system
CN112347361B (en) Method for recommending object, neural network, training method, training equipment and training medium thereof
US20230108863A1 (en) Deep learning document generation from conversation transcripts
CN111274822A (en) Semantic matching method, device, equipment and storage medium
WO2022178011A1 (en) Auditing citations in a textual document
CN117493529B (en) Anthropomorphic dialogue method and device based on natural language model and electronic equipment
CN112100377A (en) Text classification method and device, computer equipment and storage medium
CN113761868A (en) Text processing method and device, electronic equipment and readable storage medium
CN113761190A (en) Text recognition method and device, computer readable medium and electronic equipment
US11699435B2 (en) System and method to interpret natural language requests and handle natural language responses in conversation
CN115455151A (en) AI emotion visual identification method and system and cloud platform
CN111680132B (en) Noise filtering and automatic classifying method for Internet text information
CN116361858B (en) User session resource data protection method and software product applying AI decision
CN114048319B (en) Humor text classification method, device, equipment and medium based on attention mechanism
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
US11222177B2 (en) Intelligent augmentation of word representation via character shape embeddings in a neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230903

Address after: 530200 No.1 Wuxiang Avenue, Liangqing District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant after: Yang Quan

Address before: 530200 No. 12-1, Zhiyuan 1st Street, Liangqing District, Nanning, Guangxi

Applicant before: Guangxi Nanning Xibei Technology Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240105

Address after: 100000 room 17001, 1701, 17th floor, No. 25, Middle East Third Ring Road, Chaoyang District, Beijing

Applicant after: Beijing infinite free culture media Co.,Ltd.

Address before: Room 528, 5th Floor, Building D, Building 33, No. 99 Kechuang 14th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing, 100000 (Yizhuang Cluster, High end Industrial Zone, Beijing Pilot Free Trade Zone)

Applicant before: Beijing Peihong Wangzhi Technology Co.,Ltd.

Effective date of registration: 20240105

Address after: Room 528, 5th Floor, Building D, Building 33, No. 99 Kechuang 14th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing, 100000 (Yizhuang Cluster, High end Industrial Zone, Beijing Pilot Free Trade Zone)

Applicant after: Beijing Peihong Wangzhi Technology Co.,Ltd.

Address before: 530200 No.1 Wuxiang Avenue, Liangqing District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant before: Yang Quan

GR01 Patent grant
GR01 Patent grant