CN108171073A - A kind of private data recognition methods based on the parsing driving of code layer semanteme - Google Patents

A kind of private data recognition methods based on the parsing driving of code layer semanteme Download PDF

Info

Publication number
CN108171073A
CN108171073A CN201711277112.1A CN201711277112A CN108171073A CN 108171073 A CN108171073 A CN 108171073A CN 201711277112 A CN201711277112 A CN 201711277112A CN 108171073 A CN108171073 A CN 108171073A
Authority
CN
China
Prior art keywords
privacy
code
private data
keyword
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711277112.1A
Other languages
Chinese (zh)
Other versions
CN108171073B (en
Inventor
杨珉
杨哲慜
南雨宏
张源
朱东来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201711277112.1A priority Critical patent/CN108171073B/en
Publication of CN108171073A publication Critical patent/CN108171073A/en
Application granted granted Critical
Publication of CN108171073B publication Critical patent/CN108171073B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to program information safety detection technology field, specially a kind of private data recognition methods based on the parsing driving of code layer semanteme.The method of the present invention includes:The semantic analysis of privacy correlation and code snippet positioning based on natural language processing technique:Extract the character string constant identifier in code, after pretreatment, semantic information in character string constant is matched with the semantic associated privacy dictionary pre-defined, judge whether it shows specific private data by the dependence of part of speech label and different terms in sentence phrase in character string constant;Privacy correlative code segment identification based on machine learning:Using the supporting vector machine model of machine learning, whether the code characteristic behavior used by extracting private data is used as judges given code comprising system private data of interest.By being identified to this kind of private data, sensitive data source is marked as, so as to reduce the disclosure risk of privacy of user data.

Description

A kind of private data recognition methods based on the parsing driving of code layer semanteme
Technical field
The invention belongs to program information safety monitoring technology fields, and in particular to private data recognition methods.
Background technology
Traditional automation privacy leakage detection is only focused in the private data of particular system management and control, is such as directed to geographical location Information is merely able to specified single AP I(Such as getLastKnownLocation ())As private data source, combining information later Flow point analyses to judge whether such private data has flowed to specific terminal(Such as network interface), it is hidden so as to judge whether to form Private leakage.With the fast development of mobile application, traditional private data source can not be covered included in mobile application Many novel private datas.For example, except the privacy of removing system management and control, different applications has with itself using relevant hidden Private data, such as user account data, bank card data, sensitive historical record etc..These private datas and system permission There is no direct relations for model, are referred to as nonsystematic management and control private data in the present invention.
For the private data of such nonsystematic management and control, traditional information flow analysis tool is difficult to directly be identified Come.This is because different from traditional privacy source, the privacy of nonsystematic management and control is often other other than equipment itself Place so that it can not carry out unified directly mark from code angle.For example, many private datas come from user's input, Private data is transmitted to by way of EditText.getText () in program in registration or landfall process by user It, must if by getText (), this API is identified as private data source using traditional privacy source identification mode in portion It can so cause largely to report by mistake, this is because the data much got from interface might not include privacy of user(Such as input Commodity amount).In addition, it is more such as used often from the cloud server where application itself using associated privacy data Work as after it is logged in using account using that can be cached to apply by HTTP request by the privacy of user data of server end at family In, it is used further to different scenes later.In this case, mark can be automated there has been no method to come from server Which data is privacy of user.
Invention content
The object of the present invention is to provide a kind of completely new private data recognition methods based on the parsing driving of code layer semanteme, Suitable for its nonsystematic management and control private data included is automatically identified in the code of application on a large scale.
Private data recognition methods proposed by the present invention based on the parsing driving of code layer semanteme, including two parts, when The semantic analysis of privacy correlation and code snippet positioning based on natural language processing technique, second is that the privacy phase based on machine learning Close code snippet identification.In the first portion, the character string constant identifier in code is extracted first(Such as constants etc.), After by series of preprocessing, by the semantic information in character string constant and the semantic associated privacy dictionary pre-defined It is matched, passes through the part of speech label in character string constant(POS Tagging)And different terms are in sentence phrase Dependence judge whether it shows specific private data.In the second portion, using the supporting vector of machine learning Machine model, the code characteristic behavior used by extracting private data, which is used as, judges whether given code contains system and closed The private data of note.It is complementary to one another semantic information and code structure feature and the mark of private data is realized with reference to by way of Know.By the way that this kind of private data is identified, sensitive data source can be marked as, be such private data monitoring and Protection provides basis, so as to reduce the disclosure risk of privacy of user data.
Final design framework of the present invention is as shown in Figure 1, following two parts that will be described in detail the present invention:
First, the privacy correlation semantic analysis based on natural language processing technique and code snippet positioning, detailed process are as follows:
(1)Define privacy information:Whether the present invention first defines the relevant keyword of some privacies, and pass through in text and occur These keywords tentatively to judge whether text is privacy correlation;Keyword set is extracted by artificial screening.For example, come from paddy Sing the relevant keyword of privacy provided in privacy policy document, near synonym of these keywords and from 10000 Googles The word that extracted in the application of application market and these keywords have higher similarity is formed.
The present invention obtains 121 keywords through the above way, is divided into 4 types:User Attributers (user property), User Identifiers (user identity), Location (position), Account (account).As table 1 provides The privacy associative key of part of representative, the vocabulary also can dynamic configuration, expired by adding particular keywords The identification of foot novel private data any from now on.
(2)Extract semantic information:Since developer is in character string constant, the applications such as functional based method title and name variable Abundant semantic information is often written in code segment so that these information can become effective clue, pass through language The method of justice analysis finds private data that may be present in code.Based on this phenomenon, the application of the invention from decompiling Character string constant, function name, name variable are extracted in code (without obscuring, such as global static variable name).Later to this The semantic information obtained a bit carries out pretreatment operation, wherein, including removing the character other than wherein non-letter(Such as number, under Scribing line separator), and by identify in these text messages commonly use separator and capitalization text is divided into it is multiple Entry, for example " user_addr " is resolved into " user " and " addr ", " GetUserPhoneNumber " is decomposed into " get These individual character strings of user phone number ".
(3)Position privacy correlation semantic information:After being pre-processed to the semantic information extracted, the present invention passes through Whether privacy is related tentatively to judge these semantic informations for the mode of natural language processing.The mistake based on keyword is employed successively Filter, the technologies such as the filtering based on part of speech and the filtering based on dependence step by step improve the effect of semantic analysis.
(3.1)Filtering based on keyword:In order to tentatively judge whether the semantic information extracted is that privacy is related, this hair It is bright using the relevant keyword of privacy described above, Keywords matching algorithm is used to judge semantic information whether to be hidden come preliminary It is private related.The Keywords matching algorithm is mainly deposited by checking for a keyword its each character It is in text to be processed, if it is present this section of text will be considered privacy correlation, and returns to the keyword.Matching The pseudocode of algorithm is shown in annex.
However can not complete to accurately identify privacy related data by Keywords matching algorithm, it is primarily due to very Although more character string constants include the relevant keyword of privacy, can not really show to include private data herein.Example Such as, developer often records some programs by log forms in code and analyzes state, " Mobihelp.setUserEmail () is not related to user's email data really although comprising " email " in requires a valid email ". In addition to this, the character string of many other forms also can be to judging whether that it is very serious dry that the judgement comprising private data causes It disturbs, for example, the character string constant comprising url, such as " mobile " are contained in " com/ironsource/mobilcore/ In MobileCoreReport ".In order to reduce these error messages, it is identified as the relevant semantic information of privacy in this step Whether will be further analyzed is privacy correlation.
(3.2)Filtering based on part of speech:Semantic part of speech marks to represent that specific keyword belongs in current sentence In which type of part of speech, such as noun or verb.In the analysis of the present invention, the part of speech corresponding to privacy related term of interest It needs for noun.Such as " Address " for identified geographic location address or mail address, then it should be noun(NN), such as Fruit has the verb " Address " corresponding to " Address this issue " not meet filter condition then.Included in sentence Keyword when being identified as noun, which will further be done the analysis of dependence.
(3.3)Filtering based on dependence:Dependence is used for showing composition structural relation of the phrase between sentence, It, can be by judging that sensitive word analyzes privacy of interest with the dependence corresponding to other phrases for phrase or sentence Related phrase whether be the sentence center.For this purpose, present invention utilizes following dependences to meet matching filter condition:
(3.3.1)Directly description relationship (Dobj):If keyword includes and directly describes relationship in analyzed phrase sentence, and Keyword is noun NN, then meets expection, such as " get email ", in addition, having in the description of serial number 1,2,3 in table 2 straight Connect the example of description relationship.
(3.3.2)Noun subject relationship (Nsubj):If keyword does not include straight with its context in analyzed phrase Description relationship is connect, but keyword is by dependent, then also complies with judgement and be expected, which tends to occur at not comprising complete words Phrase segment in.Such as " business phone number selected ".
(3.3.3)It negate modified relationship (Neg):If keyword is modified by a negative word in analyzed phrase, Then think, the keyword and privacy information are unrelated.Such as " Do not input your password here ".
(3.3.4)Other dependences:If keyword, which is only deposited, belongs to other dependences, such as compositive relation, then table Bright its only serves the effect of aid illustration in sentence, is not the subject of the word.Wherein, privacy associative key is often simultaneously The general idea of this non-.As there is the example of compositive relation in the description of serial number 1,2,3,6 in table 2.
Positioning based on semantic privacy correlative code segment is completed, and will be in the case where connecing by above-mentioned 3 step present invention The method of machine learning is used, whether the code characteristic behavior that is used by extracting private data is used as judges given code Contain system private data of interest.
2nd, the privacy related data identification based on code characteristic, detailed process are as follows:
After being filtered by the privacy correlation semantics recognition based on natural language processing technique, in order to identify privacy correlative code piece Whether really comprising private data, the present invention is analyzed the method being combined with machine learning using program, utilizes supporting vector section Machine SVM and pass through program and analyze extracted code characteristic and carry out private data identification.Specifically, first, pass through letter Breath flow point analysis is found by the relevant constant character string of privacy is confirmed as in semantic analysis or variable name is flowed into function tune With sentence, then, judge the function call sentence whether comprising privacy information using machine learning.When a function performs language Sentence(Line code)It is identified as privacy correlation, then the variable of data is stored in the code(Parameter or return value)Wrap Containing private data.
Feature Selection:The present invention chooses model vector of following five category features as identification private data correlative code:
Feature 1:Function name:For the api function for the function name that is not confused, complete function name has very abundant Semantic information shows the concrete meaning of function.For example, the function of operation data frequently includes the verbs such as set, get to show Storage/read-write data.Therefore, the feature of function name can equally assist identifying privacy of user data.Generation is chosen herein Common five verbs of set/get/put/add/insert and corresponding privacy item phrase are as characteristic dimension in code.
Feature 2:Function parameter type:Function parameter type tends to the service condition of reflection private data, for making It uses the relevant specific function of privacy and is often passed to certain types of parameter.For example, much preserve the behaviour of privacy of user data The character string of incoming String types is required to, such as function SaveUserAccount(String userAccount), with this On the contrary, partial parameters type then shows that the function is likely to be not related to the private data of user, such as starts a new line Journey can be passed to the parameter of Thread types or open activity using Intent types as parameter.Therefore, different ginseng Several classes of types, by and combinations thereof in a manner of can reflect whether the function related to privacy of user data.
Feature 3:Function return value type:Function return value can equally embody the use feature of private data.It is for example, right In the related API for obtaining privacy of user data, corresponding data are often returned with String types.For storage, send The related API of privacy of user data, it is likely that return to the value of Boolean types to show whether code performs effectively.However, such as Fruit function returns other types unrelated with data, then is likely to show the function and not comprising privacy related data.
Feature 4:Function call reference variable type:The base class of function call, which equally has, embodies the feature that data use. For an invoke sentence, if base class is certain specific data structures, more likely show that the line code is using Privacy of user data.Such as in HashMap.get () function, show to have got certain from container set as HashMap Item data.In contrast, Exception.getException () is then to obtain a certain exception information, with user data not It is related.
Feature 5:Function parameter Value Types:In static code, there are two types of function parameter Value Types, is character string respectively Constant and string variable.Since privacy related data is often with semantic relevant text label, often with character The mode of string constant is embodied in the Parameter Value Type of call function.For example, HashMap.set (" username ", $ r1) In, Parameter Value Type is character string constant(StringConstant)With string variable(StringVariable)Composition Key-value pair.In addition, the permutation and combination method of Parameter Value Type also tends to embody the service condition of private data, such as big portion In the case of point, character string constant is located at the front of variable, such as saveInstance(useraccount, “username”, $ user)Show user name being stored in useraccount.In contrast, HandleException($exception, “email”)It is likely to only report an error to email interrelated logics, there is no include real email in current code Data.
Training set:Since the present invention is using supervision property Machine learning classifiers, need to provide a certain amount of training data use In classifier training.Specifically, training set is by after " semantic analysis of privacy correlation and code snippet position " analysis The code obtained be unit, a certain number of codes are randomly selected by security expert, by manually mark confirm these with it is hidden Whether private relevant function statement really includes private data.There is enough coverages, sample number in order to reach training set According to total number should be at thousand or more.Meanwhile training dataset is it should be ensured that positive negative sample(Include private data code With not comprising private data code)Quantity totality relative equilibrium, the accuracy so as to which grader be made to reach best.
Grader selects:For the data set with good design feature vector and reasonable standard training sample, respectively The performance of a grader does not have too big gap.It is of the invention to select grader of the support vector machines as the present invention at present.Together When, the present invention is similary to be supported using arbitrary classification device, in combination with above extracted program code characteristic to private data into Row Classification and Identification, to realize optimal classification effect under different usage scenarios and sorting algorithm.
Above-mentioned grader is completed after training, for given program arbitrary code segment(Certain line code), passing through It crosses after semantic analysis, by extracting the code characteristic of the foregoing description, the present invention can judge the code snippet by grader Whether private data is really included.
The beneficial effects of the invention are as follows:The present invention proposes that a kind of completely new analytic angle and analysis method carry out recognizer Privacy of user data in code.Specifically, by the present invention in that with the mode cognizance code based on natural language processing Semantic information positioning privacy correlative code segment in the middle, while code structure feature is used, come with reference to the mode of machine learning Judge whether be truly present private data in code snippet.It is private data source with traditional directly mark fixed system API, And assay surface information determines that user inputs private data and compares, the present invention has better versatility, and can identify Go out more private datas that method can not cover before.Such as come from remote server, and do not appear in interface Privacy.
Description of the drawings
Fig. 1:Overall system architecture figure.
Specific embodiment
The present invention has designed and Implemented the above-mentioned completely new privacy number being combined based on natural language processing with machine learning According to identification method.The specific implementation of this method is described in detail in this section.
First, the privacy correlation semantic analysis based on natural language processing technique and code snippet positioning,
The present invention analyzes application on the basis of FlowDroid tools.FlowDroid is the ripe peace realized based on Soot frames Zhuo Yingyong static analysis tools.Decompiling is carried out, and get the intermediate representation of application code to application using FlowDroid (Jimple formatted files).The present invention extracts character string constant and method name in the Jimple codes of decompiling later, Variable name is as the semantic information source to be analyzed.Meanwhile for character string constant, the present invention passes through process internal information flow point Analysis, these constant labels are transmitted in potential variable.
For the constant character string after extraction, present invention uses the Stanford Parser realized based on Java come into Row natural language processing is analyzed.Stanford Parser are common syntax parsing tools, can be directed to some sentence and parse Its structure simultaneously stamps part of speech label for participle unit different in sentence, and also provided is multiple each inside sentence for showing The method of dependence between participle unit.Therefore it chooses it and realizes morphological analysis and dependence analysis.
2nd, the privacy related data identification based on code characteristic
The present invention carries out static analysis using the code intermediate representation gone out from FlowDroid decompilings, so as to extract required 5 Category feature, and train grader using the Scikit-learn kits that the python used is realized.Simultaneously for training point Class device, the present invention, using in 100 popular applications on shop, are randomly selected by security expert to being judged as privacy phase from Google The function call sentence of pass is manually marked.In order to which the quantity for balancing positive and negative training set sample makes grader obtain most preferably accurately The negative sample not comprising private data of degree, 2163 positive samples comprising private data of selection and equivalent, 4326 altogether Training set of the training sample as this method.
Table 1
Table 2
Annex:Privacy correlation matching algorithm

Claims (3)

1. a kind of private data recognition methods based on the parsing driving of code layer semanteme, which is characterized in that be divided into two parts:When The semantic analysis of privacy correlation and code snippet positioning based on natural language processing technique, second is that the privacy phase based on machine learning Close code snippet identification;
First, the privacy correlation semantic analysis based on natural language processing technique and code snippet positioning, detailed process are as follows:
(1)Define privacy information:The relevant keyword of some privacies is defined first, and passes through in text and these keys whether occur Word tentatively to judge whether text is privacy correlation;Keyword set is extracted by artificial screening;
(2)Extract semantic information:Character string constant, function name, name variable are extracted from the application code of decompiling;It Pretreatment operation is carried out to the semantic information that these are obtained afterwards, including removing the character other than wherein non-letter, and passes through identification In these text messages commonly use separator and text is divided into multiple entries by capitalization;
(3)Position privacy correlation semantic information:Whether tentatively judge these semantic informations by way of natural language processing It is related to privacy:The filtering based on keyword, the filtering based on part of speech and the filtering technique based on dependence are used successively, Improve the effect of semantic analysis step by step:
(3.1)Filtering based on keyword:Using the relevant keyword of privacy, using Keywords matching algorithm come preliminary Judge whether semantic information is privacy correlation;The Keywords matching algorithm is mainly made by checking for a keyword Its each character is obtained to be present in text to be processed, if it is present this section of text will be considered privacy correlation, And return to the keyword;
(3.2)Filtering based on part of speech:Semantic part of speech marks to represent that specific keyword belongs to assorted in current sentence The part of speech of sample, in analysis, the part of speech corresponding to privacy related term of interest is noun, when crucial included in sentence When word is identified as noun, analysis which will further be done dependence;
(3.3)Filtering based on dependence:Dependence is used for showing composition structural relation of the phrase between sentence, for Phrase or sentence analyze the related phrase of privacy of interest by judging sensitive word and the dependence corresponding to other phrases Whether be the sentence center;Meet matching filter condition for dependence below:
(3.3.1)Relationship is directly described:If keyword includes and directly describes relationship in analyzed phrase sentence, and keyword For noun, then meet expection;
(3.3.2)Noun subject relationship:If keyword does not include with its context in analyzed phrase directly describes relationship, But keyword is by dependent, then also complies with judgement and be expected;
(3.3.3)It negate modified relationship:If keyword is by a negative word modification in analyzed phrase, then it is assumed that, it should Keyword and privacy information are unrelated;
(3.3.4)Other dependences:If keyword, which is only deposited, belongs to other dependences, show that it only rises in sentence It is not the subject of the word to the effect of aid illustration;
2nd, the privacy related data identification based on code characteristic, detailed process are as follows:
First, the relevant constant character string of privacy or variable name institute are confirmed as by semantic analysis to find by information flow analysis Then whether the function call sentence being flowed into, judges the function call sentence comprising privacy information using machine learning;Such as One function of fruit performs sentence and is identified as privacy correlation, then the variable that data are stored in the code includes privacy number According to.
2. the private data recognition methods according to claim 1 based on the parsing driving of code layer semanteme, which is characterized in that In step 2, model vector of following five category features as identification private data correlative code is chosen:Function name, function parameter Type, function return value type, function call reference variable type, function parameter Value Types;
The training set of the machine learning, by being obtained later by " semantic analysis of privacy correlation and code snippet position " analysis Code is unit, and a certain number of codes are randomly selected by security expert, confirms that these are related to privacy by manually marking Function statement whether really include private data;In order to make training set that there is enough coverages, the totality of sample data Quantity is at thousand or more;Meanwhile training dataset is not it should be ensured that positive negative sample comprising private data code and includes privacy The quantity totality relative equilibrium of data code, makes grader reach best accuracy.
3. the private data recognition methods according to claim 2 based on the parsing driving of code layer semanteme, which is characterized in that From Google using in 100 popular applications on shop, randomly selected by security expert to being judged as the relevant function tune of privacy It is manually marked with sentence;Choose 4326 training samples, the positive sample and equivalent for including private data including 2163 The negative sample not comprising private data, as training set;
Support vector machines are selected as grader.
CN201711277112.1A 2017-12-06 2017-12-06 Private data identification method based on code layer semantic parsing drive Active CN108171073B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711277112.1A CN108171073B (en) 2017-12-06 2017-12-06 Private data identification method based on code layer semantic parsing drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711277112.1A CN108171073B (en) 2017-12-06 2017-12-06 Private data identification method based on code layer semantic parsing drive

Publications (2)

Publication Number Publication Date
CN108171073A true CN108171073A (en) 2018-06-15
CN108171073B CN108171073B (en) 2021-08-20

Family

ID=62525309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711277112.1A Active CN108171073B (en) 2017-12-06 2017-12-06 Private data identification method based on code layer semantic parsing drive

Country Status (1)

Country Link
CN (1) CN108171073B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299610A (en) * 2018-10-02 2019-02-01 复旦大学 Dangerous sensitizing input verifies recognition methods in Android system
CN109445306A (en) * 2018-10-26 2019-03-08 湖南磁浮技术研究中心有限公司 Automatic associated parameter interpretation method and system based on rule configuration analysis
CN109582861A (en) * 2018-10-29 2019-04-05 复旦大学 A kind of data-privacy information detecting system
CN109686369A (en) * 2018-12-21 2019-04-26 秒针信息技术有限公司 Audio-frequency processing method and device
CN109766715A (en) * 2018-12-24 2019-05-17 贵州航天计量测试技术研究所 One kind is towards the leakage-preventing automatic identifying method of big data environment privacy information and system
CN111143203A (en) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 Machine learning method, privacy code determination method, device and electronic equipment
CN111159663A (en) * 2019-12-30 2020-05-15 厦门市美亚柏科信息股份有限公司 Instruction positioning method and device
CN111967015A (en) * 2020-07-24 2020-11-20 复旦大学 Defense agent method for improving Byzantine robustness of distributed learning system
CN112270018A (en) * 2020-11-11 2021-01-26 中国科学院信息工程研究所 Scene-sensitive system and method for automatically placing hook function
CN113692724A (en) * 2019-04-19 2021-11-23 微软技术许可有限责任公司 Sensitive data detection in communication data
CN114925373A (en) * 2022-05-17 2022-08-19 南京航空航天大学 Method for automatically identifying vulnerability of privacy protection policy of mobile application based on user comment
CN117421730A (en) * 2023-09-11 2024-01-19 暨南大学 Code segment sensitive information detection method based on ensemble learning
CN117421730B (en) * 2023-09-11 2024-06-04 暨南大学 Code segment sensitive information detection method based on ensemble learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120110680A1 (en) * 2010-10-29 2012-05-03 Nokia Corporation Method and apparatus for applying privacy policies to structured data
CN102902538A (en) * 2012-09-21 2013-01-30 哈尔滨工业大学深圳研究生院 Safe development method for application middleware of mobile internet intelligent terminal
CN104966031A (en) * 2015-07-01 2015-10-07 复旦大学 Method for identifying permission-irrelevant private data in Android application program
CN105022958A (en) * 2015-07-11 2015-11-04 复旦大学 Android application used application program vulnerability detection and analysis method based on code library security specifications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120110680A1 (en) * 2010-10-29 2012-05-03 Nokia Corporation Method and apparatus for applying privacy policies to structured data
CN102902538A (en) * 2012-09-21 2013-01-30 哈尔滨工业大学深圳研究生院 Safe development method for application middleware of mobile internet intelligent terminal
CN104966031A (en) * 2015-07-01 2015-10-07 复旦大学 Method for identifying permission-irrelevant private data in Android application program
CN105022958A (en) * 2015-07-11 2015-11-04 复旦大学 Android application used application program vulnerability detection and analysis method based on code library security specifications

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUHONG NAN等: "《IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY》", 31 March 2017 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299610A (en) * 2018-10-02 2019-02-01 复旦大学 Dangerous sensitizing input verifies recognition methods in Android system
CN109299610B (en) * 2018-10-02 2021-03-30 复旦大学 Method for verifying and identifying unsafe and sensitive input in android system
CN109445306A (en) * 2018-10-26 2019-03-08 湖南磁浮技术研究中心有限公司 Automatic associated parameter interpretation method and system based on rule configuration analysis
CN109445306B (en) * 2018-10-26 2022-01-25 湖南磁浮技术研究中心有限公司 Automatic associated parameter interpretation method and system based on rule configuration analysis
CN109582861A (en) * 2018-10-29 2019-04-05 复旦大学 A kind of data-privacy information detecting system
CN109582861B (en) * 2018-10-29 2023-04-07 复旦大学 Data privacy information detection system
CN109686369A (en) * 2018-12-21 2019-04-26 秒针信息技术有限公司 Audio-frequency processing method and device
CN109766715A (en) * 2018-12-24 2019-05-17 贵州航天计量测试技术研究所 One kind is towards the leakage-preventing automatic identifying method of big data environment privacy information and system
CN109766715B (en) * 2018-12-24 2023-07-25 贵州航天计量测试技术研究所 Big data environment-oriented privacy information anti-leakage automatic identification method and system
CN113692724A (en) * 2019-04-19 2021-11-23 微软技术许可有限责任公司 Sensitive data detection in communication data
CN111143203A (en) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 Machine learning method, privacy code determination method, device and electronic equipment
CN111143203B (en) * 2019-12-13 2022-04-22 支付宝(杭州)信息技术有限公司 Machine learning method, privacy code determination method, device and electronic equipment
CN111159663B (en) * 2019-12-30 2022-04-29 厦门市美亚柏科信息股份有限公司 Instruction positioning method and device
CN111159663A (en) * 2019-12-30 2020-05-15 厦门市美亚柏科信息股份有限公司 Instruction positioning method and device
CN111967015A (en) * 2020-07-24 2020-11-20 复旦大学 Defense agent method for improving Byzantine robustness of distributed learning system
CN112270018A (en) * 2020-11-11 2021-01-26 中国科学院信息工程研究所 Scene-sensitive system and method for automatically placing hook function
CN112270018B (en) * 2020-11-11 2022-08-16 中国科学院信息工程研究所 Scene-sensitive system and method for automatically placing hook function
CN114925373A (en) * 2022-05-17 2022-08-19 南京航空航天大学 Method for automatically identifying vulnerability of privacy protection policy of mobile application based on user comment
CN114925373B (en) * 2022-05-17 2023-12-08 南京航空航天大学 Mobile application privacy protection policy vulnerability automatic identification method based on user comment
CN117421730A (en) * 2023-09-11 2024-01-19 暨南大学 Code segment sensitive information detection method based on ensemble learning
CN117421730B (en) * 2023-09-11 2024-06-04 暨南大学 Code segment sensitive information detection method based on ensemble learning

Also Published As

Publication number Publication date
CN108171073B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN108171073A (en) A kind of private data recognition methods based on the parsing driving of code layer semanteme
CN109697162B (en) Software defect automatic detection method based on open source code library
CN104966031B (en) The recognition methods of non-authority associated privacy data in Android application program
US11409642B2 (en) Automatic parameter value resolution for API evaluation
CN113158653B (en) Training method, application method, device and equipment for pre-training language model
CN107704453A (en) A kind of word semantic analysis, word semantic analysis terminal and storage medium
CN106570180A (en) Artificial intelligence based voice searching method and device
CN103678684A (en) Chinese word segmentation method based on navigation information retrieval
CN106933800A (en) A kind of event sentence abstracting method of financial field
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
CN107679075B (en) Network monitoring method and equipment
CN111027323A (en) Entity nominal item identification method based on topic model and semantic analysis
CN108038173A (en) A kind of Web page classification method, system and a kind of Web page classifying equipment
CN114661872B (en) Beginner-oriented API self-adaptive recommendation method and system
Park et al. Using syntactic features for phishing detection
CN110880142A (en) Risk entity acquisition method and device
WO2021112984A1 (en) Feature and context based search result generation
CN112000802A (en) Software defect positioning method based on similarity integration
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
CN106294786A (en) A kind of code search method and system
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
CN111831803A (en) Sensitive information detection method and device and storage medium
KR102206781B1 (en) Method of fake news evaluation based on knowledge-based inference, recording medium and apparatus for performing the method
CN109672586A (en) A kind of DPI service traffics recognition methods, device and computer readable storage medium
CN112380848B (en) Text generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant