WO2017190617A1 - 广告检测方法及广告检测装置、存储介质 - Google Patents

广告检测方法及广告检测装置、存储介质 Download PDF

Info

Publication number
WO2017190617A1
WO2017190617A1 PCT/CN2017/082069 CN2017082069W WO2017190617A1 WO 2017190617 A1 WO2017190617 A1 WO 2017190617A1 CN 2017082069 W CN2017082069 W CN 2017082069W WO 2017190617 A1 WO2017190617 A1 WO 2017190617A1
Authority
WO
WIPO (PCT)
Prior art keywords
advertisement
feature
advertisements
features
sample
Prior art date
Application number
PCT/CN2017/082069
Other languages
English (en)
French (fr)
Inventor
易洪
王佳斌
罗元海
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017190617A1 publication Critical patent/WO2017190617A1/zh
Priority to US16/030,749 priority Critical patent/US11334908B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0243Comparative campaigns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0273Determination of fees for advertising

Definitions

  • the present invention relates to communication technologies, and in particular, to an advertisement detection method, an advertisement detection apparatus, and a storage medium.
  • a large number of applications are often installed in terminals such as smartphones and tablets.
  • the source of the application is diverse. A considerable part of the application will have built-in advertisements. Some programs that carry advertisements will pretend to be regular applications (such as various learning, entertainment, and social applications) to trick users into installing, once clicked or installed in the terminal. With such an application, advertisements frequently appear in the process of using the terminal, causing interference to the user, and even stealing user information, and calling the communication function of the terminal to communicate (such as making a call, sending a short message) to cause communication to the user. For the loss of fees, it is necessary to detect whether the application has built-in ads.
  • This method requires constant updating of the advertising features in order to detect the application carrying the new advertisement in time, on the one hand, affecting the efficiency of the advertisement detection; on the other hand, because the extracted advertisement features are inevitably lagging, for detecting the application It is less accurate in terms of whether or not to carry a new advertisement.
  • Embodiments of the present invention provide an advertisement detection method, an advertisement detection apparatus, and a storage medium, which can It is enough to extract the advertisement features from the advertisement samples in an efficient manner to detect the advertisements, thereby improving the real-time performance of the advertisement detection according to the advertisement characteristics.
  • an advertisement detection method where the method includes:
  • the feature value characterizing a probability that the feature matches a corresponding type of advertisement
  • the advertisement features of the different advertisements are compared with the features extracted from the samples to be detected, and if the comparison is successful, it is determined that the samples to be detected include advertisements of a type corresponding to the successful advertisement features.
  • an embodiment of the present invention provides an advertisement detecting apparatus, where the advertisement detecting apparatus includes:
  • An extraction module configured to acquire a sample set carrying an advertisement sample, and acquire features of each advertisement sample in the sample set
  • a detecting module configured to determine that the feature corresponds to a feature value of a different type of advertisement in the matched advertisement sample, the feature value characterizing a probability that the feature matches a corresponding type of advertisement;
  • the detecting module is further configured to: according to the feature value of the advertisement, select, in the extracted features, the advertising features of the different types of advertisements;
  • the detecting module is further configured to compare the advertisement features of the different advertisements with the features extracted from the samples to be detected, and if the comparison is successful, determine that the samples to be detected include a type corresponding to the successful advertisement features. ad.
  • an advertisement detecting apparatus including:
  • a memory configured to store an executable program
  • a processor configured to: perform the following operations by executing an executable program stored in the memory:
  • the advertisement features of the different types of advertisements are compared with the features extracted from the samples to be detected, and when the comparison is successful, the samples to be detected are determined to carry advertisements, and the advertisements of the types corresponding to the successful advertisement features are carried.
  • an embodiment of the present invention provides a storage medium storing an executable program, where the executable program is executed by a processor to:
  • the advertisement features of the different types of advertisements are compared with the features extracted from the samples to be detected, and when the comparison is successful, the samples to be detected are determined to carry advertisements, and the advertisements of the types corresponding to the successful advertisement features are carried.
  • the entire process is not
  • the invention relates to the process of manually extracting the advertisement feature, so that the advertisement feature corresponding to the new advertisement can be automatically determined by adding the advertisement sample with the new advertisement in the sample collection, so that the technical effect of the advertisement feature can be efficiently updated, and then the automatic quick update advertisement feature is based on Accurate detection of samples to be tested including new advertisements.
  • FIG. 1 is an optional schematic flowchart of an advertisement detection method according to an embodiment of the present invention.
  • 3-1 is a schematic diagram of an optional process for determining feature values in an advertisement detection method according to an embodiment of the present invention
  • 3-2 is another schematic processing diagram of determining feature values in an advertisement detecting method according to an embodiment of the present invention.
  • 4-1 is a schematic diagram of an optional process for determining feature values in an advertisement detection method according to an embodiment of the present invention
  • 4-2 is another schematic processing diagram of determining feature values in an advertisement detecting method according to an embodiment of the present invention.
  • 5-1 is an optional processing diagram of determining feature values in an advertisement detection method according to an embodiment of the present invention.
  • FIG. 5-2 is another schematic processing diagram of determining feature values in an advertisement detecting method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of an optional process for screening advertisement features in an advertisement detection method according to an embodiment of the present invention.
  • FIG. 7 is an optional structural diagram of an advertisement detecting apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an optional process for extracting advertisement features of an advertisement detecting apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an optional process for screening advertisement features of an advertisement detecting apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an optional process for querying advertisement features of an advertisement detecting apparatus according to an embodiment of the present invention.
  • FIG. 11 is still another optional schematic flowchart of an advertisement detection method according to an embodiment of the present invention.
  • FIG. 12 is a schematic topological structural diagram of an advertisement detecting apparatus according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of an optional scenario in which an advertisement detecting apparatus performs advertisement detection according to an embodiment of the present invention
  • FIG. 14 is a schematic diagram of an optional hardware structure when the advertisement detecting apparatus provided by the embodiment of the present invention is implemented as a server.
  • Advertising media information for the purpose of promoting products and services through pictures, videos, audio and text.
  • the sample of advertisements including two cases: 2.1) the advertisement, that is, the advertisement itself; 2.2) an application that carries advertisement or advertisement playback processing logic (such as obtaining an advertisement from an external website and playing under certain conditions), such as a social application.
  • Applications such as video applications, such as power management programs, hard disk management programs, and the like, and, for example, malicious programs that carry advertisements and pretend to be regular applications.
  • the sample to be tested that is, the sample to be tested, needs to detect a sample of an application that carries an advertisement, such as an installation package of an application, an executable program packaged with a library file, and the like.
  • the advertisement detection technology provided by the related art always manually extracts advertisement features from advertisement samples that are known to carry advertisements, such as a character string of a corresponding advertisement in an advertisement sample, or a code segment as an advertisement feature, and the advertisement feature Matching the features extracted from the samples to be detected, if the matching is successful, the advertisements can be carried in the samples to be detected.
  • This way of detecting advertisements has at least the following problem: the developer can evade detection of advertisements in the application by simply modifying the characteristics of the advertisements carried in the application. In this case, if it is necessary to detect the advertisements with the modified features Inevitably, it is necessary to manually extract new advertisement features from new advertisement samples to detect advertisements.
  • the manner of manually extracting the advertisement feature affects the efficiency of the advertisement detection due to the low efficiency of the extraction; on the other hand, the advertisement detection has a large Hysteresis, ads that fail to detect new ads or modify ad characteristics in a timely manner.
  • the feature of each advertisement sample is automatically extracted for a set of advertisement samples carrying advertisements (hereinafter referred to as a sample set), for the extracted features.
  • the statement includes the following two situations: 1) the advertising characteristics of the advertisements carried in the advertisement sample; 2) the characteristics of the program itself in the advertisement sample.
  • the feature is extracted from the advertisement sample, and the feature value of each feature is determined, and the feature value is used to indicate that the feature is the probability of carrying the advertisement in the advertisement sample; the extracted feature is filtered based on the feature value to obtain the sample set.
  • the characteristics of different types of advertisements carried in the advertisement sample are also advertisement features; the advertisement features based on different types of advertisements are compared with the features of the samples to be detected, and if the comparisons are consistent, it is determined that the samples to be detected carry the corresponding types of advertisements.
  • the features are further filtered to obtain the features of the advertisements, and the whole process does not involve manual extraction of the feature of the advertisements, as long as
  • the advertisement feature corresponding to the new advertisement can be automatically determined, thereby achieving the technical effect of efficiently updating the advertisement feature, and then the sample to be detected carrying the new advertisement can be performed based on the automatic quick update advertisement feature.
  • FIG. 1 is a schematic flowchart of an optional method for detecting an advertisement according to an embodiment of the present invention, including steps 101 to 105. The following describes each step.
  • Step 101 Acquire a sample set formed by the advertisement sample carrying the advertisement.
  • the advertisement samples in the sample set may be collected periodically from the developer side and the user side.
  • the advertisement sample may be judged by the developer whether the unknown sample carries the advertisement and the type of the advertisement to be carried, or according to the terminal.
  • the feedback information submitted by the user for the sample (such as whether the sample carries the advertisement and the type of advertisement) is obtained.
  • the sample set includes an advertisement sample 1 and an advertisement sample 2, and for the advertisement sample 1 and the advertisement sample 2, an application with an advertisement determined by a developer by means of manual analysis and judgment, wherein the advertisement Sample 1 carries two different types of advertisements, advertisement 1 and advertisement 2, advertisement sample 2 carries advertisement 1 type of advertisement, advertisement sample 3 carries advertisement 3, one type of advertisement, and advertisement sample 1 is recorded as ⁇ advertise sample 1, advertising 1, wide 2>, the advertisement sample 2 is recorded as ⁇ advertisement sample 2, advertisement 1>, the advertisement sample 2 is recorded as ⁇ advertisement sample 2, advertisement 1>, and the advertisement sample 3 is recorded as ⁇ advertisement sample 3, advertisement 3>.
  • Table 1 shows that:
  • Ad sample 1 ⁇ Ad sample 1, ad 1, ad 2>
  • Ad sample 2 ⁇ Ad Sample 2
  • Ad sample 3 ⁇ Ad sample 3, ad 3>
  • Step 102 extracting features of each advertisement sample in the sample set.
  • the code for each of the ad samples in the sample set is parsed to obtain the characteristics of each function in the code in the binary code sequence dimension.
  • the characteristics of the advertisement sample may also be in other manners, for example, by statically analyzing the code extraction function stream of the advertisement sample as a feature of the advertisement sample; in other embodiments of the present invention, the extraction of features from the advertisement sample is not excluded.
  • an application sample with an advertisement for an Android platform is used as an example.
  • the installation package of an application formatted as an Android package (APK, Android PacKage) is unpacked to obtain a file of the form Dex (Dalvik VM executes) (that is, Executable program), parsing the Dex file from the function dimension, such as parsing all the functions of the Dex file, and extracting the characteristics of the bytecode (Opcode) of each function.
  • the bytecode is in the computer instruction (binary code). Part of, configured to specify the operation to be performed, the operation indicated by the advertisement bytecode is significantly different from the operation indicated by the bytecode of the application itself, and the advertisement and application can be effectively distinguished according to the characteristics in the bytecode dimension. program.
  • feature 1 belongs to one of the following cases: 1) feature 1 is a feature of an advertisement in advertisement 1 and advertisement 2 (ie, an advertisement feature). Either the feature common to the advertisement 1 and the advertisement 2; 3) the feature 1 is the feature of the application in the advertisement sample 1 (that is, the feature 1 is not the advertisement feature at this time).
  • feature 1 belongs to one of the following situations: 1) feature 1 is the feature of ad 1 (ie feature 1 is an ad feature); 2) feature 1 is an ad The characteristics of the application in sample 2 (at this time feature 2 is not an advertising feature).
  • feature 2 belongs to one of the following cases: 1) feature 2 is the feature of ad 3 (ie feature 3 is an ad feature); 2) feature 2 is an ad sample 2 characteristics of the application (at this time feature 3 is not an advertising feature).
  • the following describes how to select the features of the advertisement 1 to the advertisement 3, that is, the advertisement features, from the feature 1 and the feature 2 in combination with the subsequent steps.
  • Step 103 Determine that the feature corresponds to a feature value of a different type of advertisement, and the different type of advertisement is an advertisement carried in an advertisement sample to which the feature matches.
  • the advertisement sample matched by the feature refers to an advertisement sample carrying the feature, and the feature is matched with the feature of the advertisement sample in the sample set to determine an advertisement sample matched by the feature, and the feature value of the feature corresponds to different types of advertisements one by one. For one feature, if the matched ad sample carries multiple types of ads, the feature has a corresponding feature value for each type of ad.
  • the advertisement sample matched by the feature 1 includes the advertisement sample 1 and the advertisement sample 2, and the advertisement sample 1 carries two different types of advertisements of the advertisement 1 and the advertisement 2, and the advertisement sample 2 carries the advertisement 1 Then, the feature 1 has a feature value with respect to the advertisement 1 and a feature value with respect to the advertisement 2.
  • the feature is a feature of the application in the matched advertisement sample; 2) the feature is the matched advertisement A feature of an ad in a sample or a feature shared by multiple ads.
  • the feature value 1 may be an advertisement feature of at least one of the advertisement 1 and the advertisement 2, or a feature of the application in the advertisement sample 1.
  • the ad sample matched by the feature carries only one type of advertisement, and the feature satisfies one of the following situations and only satisfies one:
  • the feature is a feature of a type of advertisement that is uniquely carried in the advertisement sample, or a feature common to multiple types of advertisements carried in the advertisement sample, that is, the feature can match at least one type of advertisement;
  • the feature is the feature of the application itself in the advertisement sample, that is to say the feature cannot match the advertisement carried in the advertisement sample.
  • the advertisement sample matched by the feature includes two or more (including two) types of advertisements, and the feature satisfies one of the following situations:
  • the feature is a feature of a part of the advertisement (at least two advertisements) carried in the advertisement sample, that is to say, the feature can match the advertisement sample (the advertisement sample matched by the feature refers to the advertisement sample having the feature) At least part of the advertisement;
  • the feature is a feature common to all advertisements carried in the advertisement sample, that is to say, the feature can match each advertisement in the advertisement sample (the advertisement sample matched by the feature refers to the advertisement sample having the feature);
  • the feature is a feature of the application in the ad sample, that is to say that the feature cannot match any one of the ad sample (the ad sample matched by the feature, ie the ad sample with the feature).
  • a feature value of a feature relative to a different type of advertisement is established for each feature, and the feature value indicates that the feature is a corresponding one carried in an advertisement sample (an advertisement sample matched by the feature) The probability of a type of ad.
  • the feature 1 matches the advertisement sample 1 and the advertisement sample 2, and the advertisement sample matched by the feature 1 carries two different types of advertisements of the advertisement 1 and the advertisement 2, and the feature 1 corresponds to the feature value representation feature 1 of the advertisement 1 It is the probability of the advertisement feature of the advertisement 1, and the feature 1 corresponding to the feature value of the advertisement 2 indicates that the feature 1 is the advertisement feature of the advertisement 2, as shown in Table 3 below.
  • Feature 1 corresponds to the feature value of advertisement 1
  • Feature 1 is the probability of the ad feature of ad 1.
  • Feature 1 corresponds to the feature value of advertisement 2
  • Feature 1 is the probability of the ad feature of ad 2
  • the advertisement sample matched by the feature 2 carries the advertisement 3, and the feature value of the feature 2 corresponding to the advertisement 3 is the probability that the feature 2 is the advertisement feature of the advertisement 3, as shown in Table 4 below.
  • Feature 2 corresponds to the feature value of advertisement 3
  • Feature 2 is the probability of the advertising feature of Ad 3
  • Determining the feature values of a feature has a number of different ways, which are described separately below.
  • determining the feature value of the feature determining a feature value corresponding to the different type of advertisement for each of the extracted features, see an optional flow diagram of determining the feature value of the feature illustrated in FIG. 2-1, including Steps 1031a to 1033a will be described below in conjunction with the respective steps.
  • Step 1031a determining different types of advertisements carried by the advertisement samples that the features match.
  • features extracted from one ad sample may match other ad samples in the sample set, in the previous example, feature 1 extracted from ad sample 1 also matches ad sample 2 in the sample set, thus feature 1 matches
  • the advertisement samples are the advertisement sample 1 and the advertisement sample 2, and correspondingly determine that the advertisement samples (Ad sample 1 and advertisement sample 2) matched by the feature 1 carry two different types of advertisements of the advertisement 1 and the advertisement 2.
  • the feature 2 extracted from the ad sample 3 only matches The advertisement sample 3 in the sample set, so the advertisement sample matched by the feature 3 is the advertisement sample 3, and the advertisement sample matched by the feature 3 carries only one type of advertisement of the advertisement 3, and the advertisement sample matched for the feature 1 and the feature 2
  • Table 5 the situation is shown in Table 5.
  • Step 1032a for the different types of advertisements carried in the feature matching advertisement samples, determine the number of advertisement samples matching the different types of advertisements.
  • an ad sample that matches a feature carries multiple types of ads, then for each type of ad, it is determined that the feature matches the number of ad samples that include the corresponding type of ad.
  • the feature 1 matching advertisement sample carries two types of advertisements, namely advertisement 1 and advertisement 2, respectively, it is determined that the feature 1 matches the number of advertisement samples carrying the advertisement 1 (the number is 2, including the advertisement sample 1 and the advertisement sample 2). And feature 1 matches the number of ad samples carrying ad 2 (the number is 1, including ad sample 2).
  • a feature matches an ad sample that includes only one type of ad, then for that type of ad, it is determined that the feature matches the number of ad samples that include the corresponding type of ad.
  • the aforementioned feature 2 matching ad sample carries the ad 3, determining that the feature 2 matches the number of ad samples carrying the ad 3 (which is 1).
  • step 1033a the number of advertisement samples whose characteristics match the corresponding type of advertisement is determined as the feature value corresponding to the corresponding type of advertisement.
  • the feature corresponds to the value of the feature value of a type of advertisement, and has a positive correlation with the probability that the feature is an advertisement feature of the corresponding type of advertisement, and represents the probability that the feature matches the corresponding type of advertisement; :
  • Match features to carry The number of corresponding types of advertisement samples is used as a feature corresponding to the feature value of the corresponding type of advertisement.
  • the number of advertisement samples matching the advertisement 1 is (2), and the feature 1 matches the number of advertisement samples carrying the advertisement 2 (1), then Feature 1 corresponds to the feature value of advertisement 1 being 2, and feature 1 corresponds to the feature value of advertisement 2 being 1, and the advertisement sample carrying advertisement 3 is not matched.
  • the number of the advertisement samples matching the feature 2 to the advertisement 3 is 1, the advertisement sample carrying the advertisement 1 and the advertisement 2 is not matched, and the features 1 to 2 correspond to
  • Table 6 The characteristics of each type of advertisement are shown in Table 6 below:
  • the sample set used carries an advertisement sample, and further includes a non-advertising sample, and the non-advertising sample refers to an application that does not carry an advertisement, such as various applications that can be installed on the terminal device.
  • Programs including social applications, game applications, communication applications, etc.
  • the sample collection carries non-advertising samples, it is possible to improve the accuracy of the probability that the usage feature corresponds to a certain type of advertisement to express the probability of being a corresponding type of advertisement. See Table 7 for an example of a sample set:
  • Ad sample 1 Ad sample 1 (advertising 1, ad 2) Ad sample 2 Ad sample 2 (advertising 1) Ad sample 3 Ad sample 3 (advertising 3) Non-advertising sample 4 ⁇
  • Ad sample 1 Ad sample 1 (advertising 1, ad 2)
  • Ad sample 2 Ad sample 2 (advertising 1)
  • Ad sample 3 Ad sample 3 (advertising 3)
  • Non-advertising sample 4 Non-advertising sample 5 ⁇
  • the sample set shown in Table 7 includes a non-advertising sample 4 and a non-advertising sample 5, and the non-advertising sample 4 and the non-advertising sample 5 do not carry an advertisement.
  • Step 1031b determining different types of advertisements carried in the advertisement samples matched by the feature.
  • Features extracted from one ad sample may match other ad samples in the sample set, and need to be statistically determined based on the case where features are extracted from the ad sample and the case where different types of ads are included in the ad sample.
  • the feature 1 extracted from the advertisement sample 1 also matches the advertisement sample 2 in the sample set, so the advertisement samples matched by the feature 1 are the advertisement sample 1 and the advertisement sample 2, and accordingly, it is determined that the feature 1 matches
  • the ad sample carries two different types of ads, Ad 1 and Ad 2.
  • the feature 3 extracted from the advertisement sample 3 only matches the advertisement sample 3 in the sample set, so the advertisement sample matched by the feature 2 is the advertisement sample 3, and the advertisement sample matched by the feature 2 carries only the advertisement 3
  • the type of advertisement, the characteristics of the feature 1 to feature 2 matching advertisement samples including different types of advertisements are shown in Table 6.
  • Step 1032b In the sample set, the feature matches the ratio of the number of advertisement samples carrying different types of advertisements to the number of non-advertising samples matched by the feature, and determines that the feature corresponds to the feature value of the corresponding type of advertisement.
  • the number of feature samples 1 to 2 matched to the advertisement samples carrying different types of advertisements is as shown in Table 6.
  • feature 1 is extracted from the feature non-advertising sample 4, and feature 1 and features are extracted from the non-advertising sample 5.
  • Table 8 The samples matching the feature 1 to feature 2 (advertising samples and non-advertising samples) and their carrying advertisements are shown in Table 8 below.
  • Step 1033b Determine, according to the number of the advertisement samples carrying the different types of advertisements matched by the feature, and the number of the non-advertising samples matched by the feature, the feature value corresponding to the corresponding type of advertisement.
  • the feature corresponds to the probability that the feature value of any type of advertisement characterizes the feature of the corresponding type of advertisement; this is because for one type of advertisement, one feature matches the advertisement carrying the type of advertisement The greater the number of samples, the less the number of non-advertising samples that match the feature at the same time, indicating that the probability that the feature is an advertising feature of the type of advertisement is greater.
  • the ratio of the number of the different types of advertisement samples that are matched by the feature and the number of the non-advertising samples of the corresponding type that match the feature is used as the feature value corresponding to the corresponding type of advertisement; Understandably, for cases where the ratio is greater than 1, normalization can be performed.
  • the number of the matching advertising sample carrying the feature is adjusted to obtain the feature corresponding corresponding type of advertisement.
  • the feature value can be more accurately represented by the probability that the feature is an advertisement feature of a different type of advertisement, since it is considered that the feature may also be a feature of the application itself.
  • the number of matching advertisement samples carrying the advertisement 1 is 2
  • the advertisement sample 3 carrying the advertisement 3 is not matched
  • the number of matched non-advertising samples carrying the feature 1 is matched.
  • the number of advertisement samples carrying the advertisement 2 matched by the feature 1 is 1 (corresponding to the advertisement Sample 2), the advertisement sample 3 carrying the advertisement 3 is not matched, and the number of matched non-advertising samples carrying the feature 2 is 1, and the feature 1 corresponds to the feature value of the advertisement 1 as a ratio (2/1).
  • the number of matching advertisement samples carrying the advertisement 3 is 1, the advertisement sample carrying the advertisement 1 and the advertisement 2 is not matched, and the matched non-advertising sample carrying the feature 2 is matched.
  • the number of non-advertising samples carrying the feature matched by the feature the number of the feature samples matching the feature is divided, and the feature value corresponding to the corresponding type of advertisement can be achieved.
  • the feature value when the feature matching the number of non-advertising samples carrying the feature is large, and the feature value having the feature matching less non-advertising sample is smaller, from the advertisement sample and
  • the matching of the two dimensions of the non-advertising sample calculates the feature value to enable the feature value to more accurately characterize the probability that the feature is an advertising feature.
  • the ratio of the number of the advertisement samples carrying the different types of advertisements matched with the feature to the sum is calculated, and the feature value corresponding to the corresponding type of advertisement is determined as the feature.
  • the feature has corresponding feature values for different types of advertisements, that is, the feature values of the features are in one-to-one correspondence with the advertisement types, and for the feature values corresponding to one type of advertisements, the matching features are matched by the calculated features.
  • the feature value indicates the ratio of the number of advertisement samples of one type of advertisement to which the feature matches the number of all samples (advertising sample and non-advertising sample) to which the feature matches
  • the feature value can also be regarded as indicating that the feature pair is carried.
  • the hit rate of the advertisement of this feature requires normalization of the feature values.
  • the number of matched advertisement samples carrying the advertisement 1 is 2, and the advertisement sample 3 carrying the advertisement 3 is not matched, and the matched carrier is carried.
  • the number of the advertisement samples matching the feature 1 to the advertisement 2 is 1 (corresponding to the advertisement sample 2), and the advertisement sample 3 carrying the advertisement 3 is not matched, and the matching is performed.
  • feature a and feature b assuming that the number of advertisement samples matching feature a and feature b is the same, and the number of non-advertising samples matched by feature a is greater than that of feature b matching
  • the number of the advertisement samples carrying the feature matched with the feature is added (that is, the number of the non-advertising samples in which the feature is matched to the feature, and the feature is matched to carry the feature).
  • the sum of the number of advertisement samples is divided into two dimensions: the information is matched to the number of the advertisement sample and the non-advertising sample, and the feature values of the feature a and the feature b can be accurately distinguished, that is,
  • the feature value of the feature can be made more accurate by calculating the feature value of the feature from the feature matching to the number of the advertisement samples carrying the feature and the number of all the samples (advertising samples and non-advertising samples) carrying the feature.
  • Step 104 Filter the advertisement features of different types of advertisements in the extracted features based on the feature values corresponding to the different types of advertisements.
  • a set formed by a plurality of features (referred to as a feature set) and a feature value corresponding to each type of advertisement of each feature are obtained, as shown in FIG. 6 , and different screening strategies are selected to select corresponding differences.
  • the advertisement features of the type advertisements form a library of advertisement features, and the advertisement characteristics of different types of advertisements are filtered out in conjunction with the different screening strategies shown in FIG.
  • For each type of ad determine the characteristics that only match the ad sample that carries the ad of that type. For the advertisement feature of the corresponding type of advertisement, when only the feature of the advertisement sample including the advertisement of the type is matched, the feature with the largest feature value is selected as the advertisement feature of the advertisement of the type.
  • the advertisement sample matched by the feature may include multiple types of advertisements, and the feature corresponding to the plurality of types of advertisements has corresponding feature values, for example, the foregoing feature 1 corresponds to the advertisement 1 and the advertisement 2 has corresponding The feature value; the advertisement sample matched by the feature may include a type of advertisement, and the feature corresponding to the type of advertisement has a corresponding feature value, for example, the foregoing feature 2 only corresponds to the advertisement 2, and the feature 2 has the feature value corresponding to the advertisement 2 .
  • the advertisement sample carrying the advertisement includes a plurality of features, then among the plurality of features, the selected one of the matched advertisement samples includes only the type.
  • the feature of the advertisement is the advertisement feature of the advertisement of the type, that is, the feature having a one-to-one correspondence with the advertisement is preferentially selected as the advertisement feature of the advertisement.
  • the advertisement samples matched by the feature 1 and the feature 2 each carry the advertisement 1, and further, the feature 1 is also matched to the advertisement sample 1 carrying the advertisement 3, and the feature 2 matches the advertisement sample carrying only the advertisement 3. 2.
  • the feature 2 forms a one-to-one correspondence with the advertisement 3, and the feature 2 is preferentially selected as the advertisement feature of the advertisement 3.
  • the feature is determined as the feature value of the corresponding type of advertisement corresponding to the maximum feature value based on the feature value corresponding to each type of advertisement of the plurality of types of advertisements.
  • the feature matches an advertisement sample including the type of advertisement, and the feature forms a one-to-one correspondence with the advertisement, 2)
  • the corresponding feature forms a one-to-many relationship with the advertisement, that is, when a feature matches an advertisement sample including a plurality of types of advertisements, the process of screening the feature values is explained.
  • a feature matches an ad sample carrying multiple types of ads, comparing the feature to a plurality of types of ads (ie, different types of ads included in the ad sample that the feature matches)
  • the feature value of each type of advertisement in the advertisement if the feature value corresponding to one type of advertisement is larger, the probability that the feature is the advertisement feature of the type of advertisement is greater, so that by comparison, the feature has the largest for which type of advertisement.
  • the feature value is used as the feature value of the type of advertisement.
  • the feature value (67%) corresponding to the advertisement 1 of the feature 1 and the feature value (50%) of the advertisement 2 corresponding to the feature 1 are compared, and the feature 1 has The maximum eigenvalue (67%) corresponds to the type of advertisement advertisement 1, so feature 1 is taken as the advertisement feature of advertisement 1.
  • the feature whose feature value exceeds the feature value threshold is selected as the advertisement feature of the corresponding type of advertisement, that is, the feature with high probability of matching a type of advertisement is selected as the advertisement feature of the corresponding type of advertisement.
  • the feature value corresponding to any type of advertisement characterizes the probability that the feature matches the type of advertisement, that is, the hit rate is positively correlated, for a type of advertisement, if there are multiple features matching the advertisement sample carrying the type of advertisement And comparing whether the feature value of each feature corresponding type advertisement exceeds the feature value threshold, and if it is exceeded, the feature is used as an advertisement feature of the corresponding type of advertisement, and correspondingly, at least one corresponding advertisement feature may be determined for one type of advertisement.
  • feature 1 is an advertisement feature of advertisement 1.
  • the third-party plug-in code includes: a payment plug-in and an account login software development kit, which can effectively avoid the cause by filtering out the features of the third-party plug-in code in the feature.
  • the above strategy for screening feature values can be flexibly selected in practical applications, for example, only one type is selected, or several types are combined in the case of no conflict. Taking the combination of mode 1) and mode 3) as an example, if this happens, multiple features match the advertisement sample including only one type of advertisement, that is, when multiple features have a one-to-one correspondence with the same type of advertisement. Among the plurality of features, a feature having the largest feature value corresponding to the type advertisement is selected as an advertisement feature of the type of advertisement.
  • feature 2 is in one-to-one correspondence with advertisement 3, and the hit rate of feature 2 for the advertisement sample of advertisement 3 is 50%, if there is still feature 3 corresponding to advertisement 3, and feature 3 is for carrying advertisement
  • the hit rate of the advertisement sample of 3 is 30%, and feature 2 is preferably selected as the advertisement feature of advertisement 3.
  • Step 105 Compare the advertisement features of different types of advertisements with the features extracted from the samples to be detected, and if the comparison is successful, determine that the samples to be detected carry the advertisements, and carry the advertisements of the types corresponding to the successful advertisement features.
  • the updated sample and the type of the advertisement included therein may be updated to the sample set as a new advertisement sample, and the non-advertising advertisement determined after the detection is determined.
  • the sample can be updated into the sample set as a new non-advertising sample, and the feature is re-screened based on the new ad sample (or based on the new ad sample and the new non-advertising sample) to update the ad features in the ad feature library.
  • the advertisement feature can be updated synchronously according to the sample to be detected, and the new sample can be accurately detected without manually judging the new sample (whether or not the advertisement is carried), and the efficiency of detecting the advertisement is improved.
  • An advertisement detecting apparatus that implements the above-described advertisement detecting method will be described.
  • an extracting module 101 a detecting module 102, and an advertising feature library module 103 are included.
  • the extraction module 101 is configured to acquire a sample set composed of advertisement samples carrying advertisements, extract features of the advertisement samples in the collection (step 301), and report them to the detection module 102.
  • the detecting module 102 determines different types of advertisements carried in the advertisement samples matched by the features, and determines that the features correspond to feature values of different types of advertisements, and the feature values represent the probability that the features are advertisement features of corresponding types of advertisements. And selecting the advertisement features of the different types of advertisements from the extracted features, and storing the advertisement features to the advertisement feature library module 103 (step 302).
  • the extraction module 101 determines different types of advertisements included in the advertisement sample to which the feature matches; for each type of advertisement, determines that the feature matches the number of advertisement samples carrying the corresponding type of advertisement,
  • the determining the feature value includes the following methods based on the quantity determining feature corresponding to the feature value of the corresponding type of advertisement:
  • Method 1 The number of advertisement samples carrying the corresponding type of advertisements matched by the feature is determined as the feature value corresponding to the corresponding type of advertisement;
  • the method 3) determines that the feature matches the number of the advertisement samples including the corresponding type of advertisements and the sum of the sums as the feature values of the corresponding types of advertisements.
  • the summation described above is the sum of the number of the non-advertising samples in the sample set matching the feature set and the number of the ad samples including the corresponding type of advertisements, and therefore, the corresponding type of advertisement in the mode 3)
  • the feature value can be considered to match the hit rate of the corresponding type of ad.
  • the ratio of the number of advertisement samples carrying the corresponding type of advertisements matched by the feature to the number of the non-advertising samples matched by the feature is calculated, for example, the feature corresponding to the corresponding type of advertisement is obtained. a feature value; or, calculating a ratio of the number of the advertisement samples carrying the corresponding type of advertisements to the sum, and obtaining a feature value corresponding to the corresponding type of advertisements, wherein the sum is matched to the feature
  • the advertisement feature of the different types of advertisements is extracted from the advertisement feature library module 103 (step 303), and the advertisement features of the different types of advertisements are compared with the features extracted from the samples to be detected, For success, it is determined that the sample to be detected includes an advertisement of a type corresponding to the successful advertisement feature, and the detection result corresponding to the sample to be detected is output (step 304).
  • the detecting module 102 can use various strategies to filter out the advertising features corresponding to different types of advertisements from the extracted features.
  • the following describes various strategies, and multiple strategies can be combined in the case of no conflict.
  • the feature with the largest feature value is selected as the advertisement feature of the corresponding type of advertisement.
  • the feature that selects the feature value exceeding the feature value threshold is an advertisement feature of the corresponding type of advertisement.
  • the above three methods can be used in combination with the following method 4): screening out the characteristics of the corresponding third-party plug-in code in the extracted features, wherein the third-party plug-in code includes: a payment plug-in and an account login software development kit, Avoid situations where the app is misdetected as a built-in ad due to some of the necessary features built into the app.
  • the extraction is performed.
  • the module 101 updates the sample set according to the detection result of the sample to be detected (whether the sample to be detected carries the advertisement and the type of the advertisement to be carried): the sample to be detected is used as a new advertisement sample when the sample to be detected carries the advertisement, and the sample to be detected is not carried.
  • the sample to be tested is used as a new non-advertising sample.
  • the feature of the advertisement sample is extracted from the updated sample set, and the detection module 102 rescreens the advertisement features of the different types of advertisements from the extracted features and updates the advertisement feature library module 103.
  • the advertisement feature in the advertisement feature library 103 is iteratively updated by using the sample to be detected, so that the advertisement feature corresponding to the new advertisement can be automatically extracted as long as the new advertisement is included in the sample to be detected, thereby realizing the sample including the new advertisement. Accurate detection.
  • the extracting module 101 acquires a sample set (step 401, carrying an advertisement sample, and may also include a non-advertising set). Referring to FIG. 8, the extracting module 101 passes the background engine.
  • Features of the ad sample are extracted from the sample set (step 402), such as an Opcode feature, or a feature stream of the function stream extracted from the code of the ad sample.
  • the detecting module 102 filters out features corresponding to different types of advertisements, that is, advertisement features, from the extracted features (step 403) and stores the advertisement feature library module 103.
  • the detecting module 102 extracts features from the detected samples (step 404), and queries the advertisement feature library module 103 whether there is a corresponding advertising feature, and if so, determines The advertisement to be detected includes a corresponding type of advertisement, otherwise it is determined that the sample to be detected does not carry an advertisement, and the detection result corresponding to the sample to be detected is output (step 405).
  • the above modules may be implemented in a manner that a server or server cluster runs an executable program for advertisement detection.
  • the extracting module 101 is implemented as a collecting server 200
  • the detecting module 102 is implemented as a detecting server 300
  • the advertising feature library module 103 Implemented as an ad feature library 400.
  • the collection server extracts features from the advertisement samples of the sample set and reports the extracted features to the detection server (step 301),
  • the advertisement detecting apparatus shown in FIG. 13 is implemented as the advertisement feature library 400, the detection server 300, and the collection server 200
  • the detection result is together with the application.
  • the application platform 500 is released.
  • the user of the terminal 600 requests to install the software from the application platform 500
  • the user of the terminal 600 is reminded to the user of the terminal 600 that an advertisement is built in the application to prompt the user whether to continue the installation.
  • the terminal user chooses not to install, the user may be recommended to the application.
  • the version of the built-in ad is not user-friendly for installation.
  • the processor 410 For an optional structural diagram of the server shown in FIG. 13, such as the advertisement feature library 400, the detection server 300, and the collection server 200, the processor 410, the input/output interface 430 (eg, display, keyboard, touch screen, speaker) One or more of the microphones, the storage medium 440, and the network interface 420 (the various forms of interfaces that support network protocols for communication, such as Ethernet interfaces), the components can be connected for communication via the system bus 450.
  • the input/output interface 430 eg, display, keyboard, touch screen, speaker
  • the components can be connected for communication via the system bus 450.
  • the storage medium 440 may be a read-only memory (ROM), a flash memory, a transfer device, a magnetic storage medium (eg, a magnetic tape, a magnetic disk drive, etc.), an optical storage medium (eg, an optical disk, a hard disk, Paper card, tape, etc.) and other well-known types of program memory; an executable program is stored in the storage medium, and when the executable program is executed, causing the processor 410 in the server to perform an operation including: extracting each of the sample sets a feature that the advertisement sample has; determining different types of advertisements carried in the advertisement samples matched by the features, and determining that the features correspond to feature values of different types of advertisements, the feature values characterizing the features are corresponding The probability of the advertisement feature of the type advertisement; based on the feature value corresponding to the different types of advertisements, the advertisement features of the different types of advertisements are selected from the extracted features; and the advertisement features and the advertisements of the different types of advertisements are utilized The features extracted by the samples to be detected are compared, and when the comparison is successful
  • the processor 410 in the server when executing the executable program, is also caused to perform an operation comprising determining the difference carried in the advertisement sample to which the feature matches Type of advertisement; for each type of advertisement, determining that the feature matches the number of advertisement samples carrying the corresponding type of advertisement, and based on the quantity, determining that the feature corresponds to a feature value of the corresponding type of advertisement.
  • the sample set further includes a non-advertising sample; when executing the executable program, causing the processor 410 in the server to perform an operation comprising: matching the number of ad samples carrying the corresponding type of ad to the feature, Determining that the feature corresponds to a feature value of a corresponding type of advertisement; or, based on the number of the non-advertising samples matched in the sample set by the feature, and an advertisement sample carrying the corresponding type of advertisement matched by the feature The number determines the feature value corresponding to the feature of the corresponding type of advertisement.
  • the processor 410 in the server when executing the executable program, is also caused to perform an operation comprising: calculating the number of ad samples carrying the corresponding type of advertisement to which the feature matches, matching the feature to a ratio of the number of the non-advertising samples to obtain a feature value corresponding to the corresponding type of advertisement; or calculating a ratio of the number of the ad samples carrying the corresponding type of advertisements to the sum of the features,
  • the feature corresponds to a feature value of the corresponding type of advertisement, the sum being the sum of the number of the non-advertising samples matched by the feature, and the number of ad samples carrying the corresponding type of ad matched to the feature.
  • the processor 410 in the server when executing the executable program, is also caused to perform an operation comprising: for each of the types of advertisements, determining that only the characteristics of the advertisement sample carrying the corresponding type of advertisement are corresponding to The advertisement feature of the type advertisement; wherein, when only the feature of the advertisement sample including the corresponding type of advertisement is matched, the feature with the largest feature value is selected as the advertisement feature of the corresponding type of advertisement.
  • the processor 410 in the server when executing the executable program, is also caused to perform an operation comprising: for each of the features, when the feature matches an advertisement sample carrying a plurality of types of advertisements, An advertisement of a type corresponding to the feature having the largest feature value is determined, and the feature is determined as a feature value of the corresponding type of advertisement.
  • the processor 410 in the server when executing the executable program, is further caused to perform an operation of selecting a feature value that exceeds the feature value threshold based on the feature value corresponding to the different types of advertisements.
  • the ad characteristics of a type of ad when executing the executable program, is further caused to perform an operation of selecting a feature value that exceeds the feature value threshold based on the feature value corresponding to the different types of advertisements.
  • the processor 410 in the server is also caused to perform an operation including: filtering out the extracted extracted feature before determining that the feature corresponds to a feature value possessed by the different type of advertisement a feature corresponding to the third-party plug-in code in the feature; wherein the third-party plug-in code includes: a payment plug-in and an account login software development kit; updating the sample set by using the sample to be detected, and based on the updated sample set Re-determining the advertising characteristics of the different types of advertisements.
  • the entire process does not involve manual extraction of the advertising features, so that as long as the sample collection is added
  • the advertisement sample of the new advertisement can automatically determine the advertisement feature corresponding to the new advertisement, so that the technical effect of the advertisement feature can be efficiently updated, and the sample to be detected including the new advertisement can be accurately detected based on the automatic quick update advertisement feature.
  • the foregoing storage medium includes: a removable storage device, a random access memory (RAM), a ROM, a magnetic disk, or an optical disk, and the like.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in a contribution to the related art, and the computer software product is stored in a
  • the storage medium includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供一种广告检测方法及广告检测装置、存储介质,方法包括:提取样本集合中每个广告样本所具有的特征;确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。

Description

广告检测方法及广告检测装置、存储介质 技术领域
本发明涉及通信技术,尤其涉及一种广告检测方法及广告检测装置、存储介质。
背景技术
智能手机、平板电脑等终端中往往会安装大量的应用程序。
目前应用程序的来源多样,相当一部分应用程序中会内置广告,部分携带广告的程序会伪装常规应用程序(如各种学习、娱乐、社交类的应用)欺骗用户安装,一旦在终端点击或安装了这样的应用程序,广告就会在用户使用终端的过程中频繁出现,给用户造成干扰,甚至会窃取用户信息、并调用终端的通信功能进行通信(如拨打电话、发送短信)给用户造成通信费用的损失,有必要对应用是否内置广告进行检测。
目前的广告检测技术,人工方式从广告样本(也就是已知携带有广告的应用程序、或者伪装成常规应用程序并携带广告的恶意应用程序)中提取广告特征,基于广告特征与从应用提取的特征匹配的方式来检测应用中是否携带广告。
这种方式需要不断地更新广告特征才能及时检测到携带新广告的应用程序,一方面,影响了广告检测的效率;另一方面,由于提取的广告特征不可避免地具有滞后性,对于检测应用程序中是否携带新广告而言准确性较低。
发明内容
本发明实施例提供一种广告检测方法及广告检测装置、存储介质,能 够以高效的方式从广告样本中提取广告特征以检测广告,从而能够提升根据广告特征进行广告检测的实时性。
本发明实施例的技术方案是这样实现的:
第一方面,本发明实施例提供一种广告检测方法,所述方法包括:
获取携带广告样本的样本集合,并获取所述样本集合中每个广告样本所具有的特征;
确定所述特征对应所匹配的广告样本中不同类型广告的特征值,所述特征值表征所述特征匹配相应类型广告的概率;
基于所述广告的特征值,在所提取的所述特征中筛选出所述不同类型广告所具有的广告特征;
利用不同所述广告的广告特征与从待检测样本提取出的特征进行比对,比对成功则确定所述待检测样本包括比对成功的广告特征所对应类型的广告。
第二方面,本发明实施例提供一种广告检测装置,所述广告检测装置包括:
提取模块,配置为获取携带广告样本的样本集合,并获取所述样本集合中每个广告样本所具有的特征;
检测模块,配置为确定所述特征对应所匹配的广告样本中不同类型广告的特征值,所述特征值表征所述特征匹配相应类型广告的概率;
所述检测模块,还配置为基于所述广告的特征值,在所提取的所述特征中筛选出所述不同类型广告所具有的广告特征;
所述检测模块,还配置为利用不同所述广告的广告特征与从待检测样本提取出的特征进行比对,比对成功则确定所述待检测样本包括比对成功的广告特征所对应类型的广告。
第三方面,本发明实施例提供一种广告检测装置,包括:
存储器,配置为存储可执行程序;
处理器,配置为通过执行所述存储器中存储的可执行程序时实现以下操作:
提取样本集合中每个广告样本所具有的特征;
确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
第四方面,本发明实施例提供一种存储介质,存储可执行程序,所述可执行程序被处理器执行时实现以下操作:
提取样本集合中每个广告样本所具有的特征;
确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
本发明实施例中,通过自动提取特征(包括有广告特征和非广告特征)并基于特征的特征值对特征进一步筛选得到广告特征的方式,整个过程不 涉及人工提取广告特征的处理,这样,只要在样本集合中添加具有新广告的广告样本即可自动确定对应新广告的广告特征,从而能够高效更新广告特征的技术效果,进而基于自动快速更新广告特征能够对包括新广告的待检测样本进行准确检测。
附图说明
图1是本发明实施例提供的广告检测方法的一个可选的流程示意图;
图2-1是本发明实施例提供的广告检测方法的另一个可选的流程示意图;
图2-2是本发明实施例提供的广告检测方法的又一个可选的流程示意图;
图3-1是本发明实施例提供的广告检测方法中确定特征值的一个可选的处理示意图;
图3-2是本发明实施例提供的广告检测方法中确定特征值的另一个可选的处理示意图;
图4-1是本发明实施例提供的广告检测方法中确定特征值的一个可选的处理示意图;
图4-2是本发明实施例提供的广告检测方法中确定特征值的另一个可选的处理示意图;
图5-1是本发明实施例提供的广告检测方法中确定特征值的一个可选的处理示意图;
图5-2是本发明实施例提供的广告检测方法中确定特征值的另一个可选的处理示意图;
图6是本发明实施例提供的广告检测方法中筛选广告特征的一个可选的处理示意图;
图7是本发明实施例提供的广告检测装置的一个可选的结构示意图;
图8是本发明实施例提供的广告检测装置的提取广告特征的一个可选的处理示意图;
图9是本发明实施例提供的广告检测装置的筛选广告特征的一个可选的处理示意图;
图10是本发明实施例提供的广告检测装置的查询广告特征的一个可选的处理示意图;
图11是本发明实施例提供的广告检测方法的又一个可选的流程示意图;
图12是本发明实施例提供的广告检测装置的一个可选的拓扑结构示意图;
图13是本发明实施例提供的广告检测装置进行广告检测的一个可选的场景示意图;
图14是本发明实施例提供的广告检测装置实施为服务器时一个可选的硬件结构示意图。
具体实施方式
以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本发明,并用于限定本发明。另外,以下所提供的实施例是用于实施本发明的部分实施例,而非提供实施本发明的全部实施例,在不冲突的情况下,本发明实施例记载的技术方案可以任意组合的方式实施。
对本发明实施例进行进一步详细说明之前,对本发明实施例中涉及的名词和术语进行说明,本发明实施例中涉及的名词和术语适用于如下的解释。
1)广告,通过图片、视频、音频和文本等形式达到宣传产品、服务目的的媒体信息。
2)广告样本,包括两种情况:2.1)广告,即广告本身;2.2)携带有广告或广告播放处理逻辑(如从外部网站获取广告并在特定条件时播放)的应用程序,如社交应用程序、视频应用等应用程序,又例如电源管理程序、硬盘管理程序等类型的服务程序,又例如携带有广告并伪装为常规应用程序的恶意程序。
3)应用程序,为针对使用者的某种应用目的所开发的软件,提供为安装包的形式,或者提供为可执行程序或各种形式的中间代码(还可以包括必要的库文件)的方式,本发明实施例中不排除使用任何形式的应用程序。
4)待检测样本,即待检测样本,需要检测是否携带广告的应用程序的样本,例如应用程序的安装包、封装有库文件的可执行程序等。
相关技术提供的广告检测技术,总是采用人工方式从已知携带有广告的广告样本中提取广告特征,如将广告样本中的对应广告的一段字符串,或者代码片段作为广告特征,将广告特征与从待检测的样本中提取的特征匹配,如果匹配成功,则可以待检测的样本中携带广告。
这种检测广告的方式至少存在以下问题:开发者可以通过对应用程序中携带广告的特征进行简单修改以逃避对应用程序中广告的检测,对于这种情况,如果需要检测出修改了特征的广告,必然需要采用人工的方式从新的广告样本提取出新的广告特征,以进行广告的检测。
由于每次更新广告特征都涉及到人工提取广告特征的操作,对于人工提取广告特征的方式,一方面,由于提取的效率低影响了广告检测的效率;另一方面,导致广告检测具有很大的滞后性,无法及时检测出新的广告或修改广告特征的广告。
针对上述问题,本发明实施例中,针对携带广告的广告样本的集合(下文种称为样本集合)中自动提取每个广告样本的特征,对于提取的特征而 言包括以下两种情况:1)广告样本中携带的广告的广告特征;2)是广告样本中程序自身的特征。鉴于这种情况,从广告样本中提取特征,并确定每个特征的特征值,使用特征值表示该特征是广告样本中携带广告的概率;基于特征值对提取的特征进行筛选,得到样本集合的广告样本中携带的不同类型广告的特征也就是广告特征;基于不同类型广告的广告特征与待检测样本的特征比对,如果比对一致则确定待检测的样本中携带有相应类型的广告。
因此,通过自动提取特征(包括有广告特征和非广告特征),并基于特征针对不同类型广告的特征值,对特征进一步筛选得到广告特征的方式,整个过程不涉及人工提取广告特征的处理,只要在样本集合中添加具有新广告的广告样本即可自动确定对应新广告的广告特征,从而能够实现高效更新广告特征的技术效果,进而基于自动快速更新广告特征能够对携带新广告的待检测样本进行准确检测。
参见图1示出本发明实施例提供的广告检测方法的一个可选的流程示意图,包括步骤101至步骤105,以下针对各步骤进行说明。
步骤101,获取携带广告的广告样本的形成的样本集合。
在一个实施方式中,样本集合中的广告样本可以定期从开发人员侧、用户侧收集,例如,广告样本可以由开发人员对未知样本是否携带广告以及携带广告的类型进行判断得到,或者,根据终端用户提交的针对样本的反馈信息(如样本中是否携带广告以及广告的类型)得到。
结合具体示例进行说明,假设样本集合中包括广告样本1和广告样本2,对于广告样本1和广告样本2而言,是开发人员通过人工分析判断的方式确定的携带有广告的应用程序,其中广告样本1中携带广告1和广告2两种不同类型的广告,广告样本2中携带广告1一种类型的广告,广告样本3中携带广告3一种类型的广告,广告样本1记为<广告样本1,广告1,广 告2>,广告样本2记为<广告样本2,广告1>,广告样本2记为<广告样本2,广告1>,广告样本3记为<广告样本3,广告3>。如下表1所示:
广告样本1 <广告样本1,广告1,广告2>
广告样本2 <广告样本2,广告1>
广告样本3 <广告样本3,广告3>
表1
步骤102,提取出样本集合中每个广告样本的特征。
在一个实施方式中,对于样本集合中的每个广告样本的代码进行解析,得到代码中每个函数在二进制代码序列维度的特征。当然,广告样本的特征还可以采用其他方式,例如,通过静态分析广告样本的代码提取函数流,作为广告样本的特征;本发明实施例中不排除使用其他方式从广告样本中提取特征。
再以广告样本为安卓平台携带有广告的应用程序为例,对于格式为安卓包(APK,Android PacKage)的应用程序的安装包进行解包,得到格式为Dex(Dalvik VM executes)文件(也就是可执行程序),对Dex文件从函数维度进行解析,比如解析该Dex文件的所有函数,并提取每个函数的字节码(Opcode)的特征,字节码是计算机指令(二进制代码)中的一部分,配置为指定要执行的操作,广告字节码所指示执行的操作与应用本身的字节码所指示完成的操作具有显著的区别,根据在字节码维度的特征可以有效区分广告与应用程序。
继续对表1示出的广告样本如表2的具体示例进行说明。
Figure PCTCN2017082069-appb-000001
表2
如表2所示,假设,从广告样本1中提取出特征1,那么特征1属于以下几种情况之一:1)特征1是广告1和广告2中一个广告的特征(即广告特征),或者是广告1和广告2共同的特征;3)特征1是广告样本1中应用程序的特征(也即是说此时特征1不是广告特征)。
再假设,从广告样本2中也提取出特征1,那么特征1属于以下几种情况之一:1)特征1是广告1的特征(也就是特征1为广告特征);2)特征1是广告样本2中应用程序的特征(此时特征2不是广告特征)。
再假设,从广告样本3中提取出特征2,那么特征2属于以下几种情况之一:1)特征2是广告3的特征(也就是特征3为广告特征);2)特征2是广告样本2中应用程序的特征(此时特征3不是广告特征)。
下面结合后续步骤对如何从特征1和特征2中筛选出广告1至广告3的特征也就是广告特征进行说明。
步骤103,确定特征对应不同类型广告的特征值,所述不同类型广告是特征所匹配到的广告样本中携带的广告。
特征所匹配的广告样本是指携带该特征的广告样本,将特征与样本集合中广告样本的特征比对可以确定该特征所匹配的广告样本,特征的特征值与不同类型的广告一一对应的,对于一个特征而言,如果所匹配的广告样本中携带多个类型的广告,则特征针对每个类型广告具有相应的特征值。
例如对于前述的特征1而言,特征1所匹配的广告样本包括广告样本1和广告样本2,且广告样本1中携带广告1和广告2两个不同类型的广告,广告样本2中携带广告1,那么特征1具有相对于广告1的特征值以及相对于广告2的特征值。
对于一个特征所匹配到的广告样本来说,包括以下两种情况:1)该特征是所匹配到的广告样本中应用程序的特征;2)该特征是所匹配到的广告 样本中一个广告的特征或多个广告共有的特征。
仍以特征1为例,对于特征1匹配到的广告样本1来说,特征值1可能是广告1和广告2中至少之一的广告特征,又或者是广告样本1中应用程序的特征。
以下对不同的情况进行说明。
1)特征所匹配的广告样本只携带一种类型的广告,该特征满足以下几种情况中的一个且仅满足一个:
a)特征是广告样本中所唯一携带的一个类型广告的特征,或者是广告样本中携带的多个类型广告共有的特征,也就是说该特征能够匹配至少一个类型的广告;
b)特征是广告样本中应用自身的特征,也就是说该特征无法匹配广告样本中所携带的广告。
2)特征所匹配的广告样本包括两种以上(包括两种)类型的广告,该特征满足以下几种情况中的一个:
a)特征是广告样本中所携带的部分广告(至少两个广告)广告的特征,也就是说该特征能够匹配广告样本(该特征所匹配的广告样本,是指具有该特征的广告样本)中的至少部分广告;
b)特征是广告样本中所携带的全部广告共有的特征,也就是说该特征能够匹配广告样本(该特征所匹配的广告样本,是指具有该特征的广告样本)中的每个广告;
c)该特征是广告样本中应用程序的特征,也就是说该特征无法匹配广告样本(该特征所匹配的广告样本,也就是具有该特征的广告样本)中的任意一个广告。
本发明实施例中,针对每个特征建立特征相对于不同类型广告的特征值,特征值表示特征是广告样本(特征所匹配的广告样本)中携带的相应 类型广告的概率。
继续结合前述示例对特征值的计算进行说明。
如前所述,特征1匹配到广告样本1和广告样本2,则特征1所匹配的广告样本中携带广告1和广告2两个不同类型的广告,特征1对应广告1的特征值表征特征1是广告1的广告特征的概率,特征1对应广告2的特征值表征特征1是广告2的广告特征的概率,如下表3所示。
特征1对应广告1的特征值 特征1是广告1的广告特征的概率
特征1对应广告2的特征值 特征1是广告2的广告特征的概率
表3
特征2所匹配到的广告样本中携带广告3,特征2对应广告3的特征值是表示特征2是广告3的广告特征的概率,如下表4所示。
特征2对应广告3的特征值 特征2是广告3的广告特征的概率
表4
确定特征的特征值具有多种不同的方式,以下分别进行说明。
确定特征的特征值的方式1)
在确定特征的特征值的一个实施方式中,对于所提取的每个特征确定对应不同类型广告的特征值,参见图2-1示出的确定特征的特征值的一个可选的流程示意图,包括步骤1031a至步骤1033a,以下结合各步骤进行说明。
步骤1031a,确定特征所匹配的广告样本携带的不同类型的广告。
从一个广告样本中提取的特征可能匹配样本集合中的其他的广告样本,在前述示例中,从广告样本1中提取的特征1还匹配了样本集合中的广告样本2,因此特征1所匹配的广告样本为广告样本1和广告样本2,相应地确定特征1所匹配的广告样本(广告样本1和广告样本2)中携带广告1和广告2两种不同类型的广告。
继续结合前述示例进行说明,从广告样本3中提取的特征2仅匹配了 样本集合中的广告样本3,因此特征3所匹配的广告样本为广告样本3,特征3所匹配的广告样本中仅携带广告3一种类型的广告,对于特征1和特征2匹配到的广告样本而言,所携带的不同类型广告的情况如表5所示。
特征1 广告1,广告2
特征2 广告1
表5
步骤1032a,针对特征匹配的广告样本中携带的不同类型广告,确定特征匹配到携带不同类型广告的广告样本的数量。
如果一个特征所匹配的广告样本携带多种类型广告,则对于每种类型广告,确定特征匹配包括相应类型广告的广告样本的数量。
例如,前述的特征1匹配的广告样本携带广告1和广告2两种类型的广告,则分别确定特征1匹配携带广告1的广告样本的数量(数量为2,包括广告样本1和广告样本2)、以及特征1匹配携带广告2的广告样本的数量(数量为1,包括广告样本2)。
如果一个特征所匹配广告样本仅包括一种类型广告,则对于该类型广告,确定特征匹配包括相应类型广告的广告样本的数量。
例如,前述的特征2匹配的广告样本携带广告3,确定特征2匹配携带广告3的广告样本的数量(为1)。
步骤1033a,将特征匹配相应类型广告的广告样本的数量确定为特征对应相应类型广告的特征值。
对于一个特征而言,特征对应一个类型广告的特征值的取值,与该特征是相应类型广告的广告特征的概率具有正相关的关系,表征了该特征匹配相应类型广告的概率;这是因为:对于一个类型的广告来说,一个特征匹配的携带该类型广告的广告样本的数量越多,表明该特征是该类型的广告的特征的概率也就越大,因此,在一个实施方式中,将特征匹配到携带 相应类型的广告样本的数量作为特征对应相应类型广告的特征值。
例如,结合图3-1,对于前述的特征1而言,匹配到携带广告1的广告样本的数量(为2)、以及特征1匹配携带广告2的广告样本的数量(为1),那么,特征1对应广告1的特征值是2,特征1对应广告2的特征值是1,未匹配携带广告3的广告样本。
同理,结合图3-2,对于前述的特征2而言,特征2匹配到携带广告3的广告样本的数量为1,未匹配携带广告1以及广告2的广告样本,特征1至特征2对应各类型广告的特征值的情况如下表6所示:
特征 广告1 广告2 广告2
特征1 2 1 \
特征2 1 \ \
表6
方式2)
在确定特征的特征值的一个实施方式中,所使用的样本集合中携带广告样本,还包括非广告样本,非广告样本是指未携带广告的应用程序,例如可以在终端设备安装的各种应用程序(包括社交应用、游戏应用、通信应用等);由于样本集合中携带非广告样本,能够提升使用特征对应某一类型广告的特征值来表达是相应类型广告的概率的精度。样本集合的一个示例参见表7:
广告样本1 广告样本1(广告1,广告2)
广告样本2 广告样本2(广告1)
广告样本3 广告样本3(广告3)
非广告样本4 \
非广告样本5 \
表7
在表7示出的样本集合中除了包括广告样本1至广告样本3,还包括非广告样本4和非广告样本5,非广告样本4和非广告样本5中未携带有广告。
参见图2-2示出的确定特征的特征值的一个可选的流程示意图,包括步骤,以下结合各步骤进行说明。
步骤1031b,确定特征所匹配的广告样本中携带的不同类型广告。
从一个广告样本中提取的特征可能匹配样本集合中的其他的广告样本,需要基于从广告样本中提取特征的情况,以及广告样本中包括不同类型广告的情况进行统计确定。
在表7中,从广告样本1中提取的特征1还匹配到样本集合中的广告样本2,因此特征1所匹配的广告样本为广告样本1和广告样本2,相应地,确定特征1所匹配的广告样本中携带广告1和广告2两种不同类型的广告。
同样地,从广告样本3中提取的特征3仅匹配了样本集合中的广告样本3,因此特征2所匹配的广告样本为广告样本3,特征2所匹配的广告样本中仅携带广告3一种类型的广告,特征1至特征2匹配广告样本所包括不同类型广告的情况如表6所示。
步骤1032b,在样本集合中,将特征匹配携带不同类型广告的广告样本的数量、与特征匹配到的非广告样本的数量的比值,确定为特征对应相应类型广告的特征值。
特征1至特征2匹配到携带不同类型广告的广告样本的数量如表6所示,另外,还假设从特征非广告样本4中提取出特征1,从非广告样本5中提取出特征1和特征2,特征1至特征2匹配的样本(广告样本和非广告样本)及其携带广告的情况如下表8所示。
Figure PCTCN2017082069-appb-000002
Figure PCTCN2017082069-appb-000003
表8
步骤1033b,基于特征匹配到的携带不同类型广告的广告样本的数量、与特征匹配到的非广告样本的数量,确定特征对应相应类型广告的特征值。
对于一个特征而言,特征对应任一类型广告的特征值表征该特征是相应类型广告的广告特征的概率;这是因为:对于一个类型的广告来说,一个特征匹配到携带该类型广告的广告样本的数量越多,同时匹配到的携带该特征的非广告样本的数量越少,表明该特征是该类型广告的广告特征的概率也就越大。
基于上述分析,在一个实施方式中,基于特征匹配到的携带不同类型的广告样本的数量、与特征匹配携带相应类型的非广告样本的数量的比值,作为特征对应相应类型广告的特征值;可以理解地,对于比值大于1的情况,可以进行归一化处理。
通过在样本集合中引入非广告样本,利用特征匹配的携带该特征的非广告样本的数量作为调整因子,来对该特征匹配到的携带该特征的广告样本的数量进行调整得到特征对应相应类型广告的特征值,由于考虑了特征还可能是应用程序自身的特征的情况,因此所得到的特征值能够更加精确表示特征是不同类型广告的广告特征的概率。
例如,结合图4-1,对于前述的特征1,匹配到的携带广告1的广告样本的数量为2,未匹配携带广告3的广告样本3,匹配到的携带特征1的非广告样本的数量为1(非广告样本4),则特征1对应广告1的特征值为两者的比值(2/1=2)。
再例如,特征1匹配到的携带广告2的广告样本的数量为1(对应广告 样本2),未匹配携带广告3的广告样本3,匹配到的携带特征2的非广告样本的数量为1,则特征1对应广告1的特征值为两者的比值(2/1)。
同样,结合图4-2,对于前述的特征2,匹配到的携带广告3的广告样本的数量为1,未匹配携带广告1以及广告2的广告样本,匹配到的携带特征2的非广告样本的数量为1(对应非广告样本5),则特征2对应广告2的特征值为两者的比值(1/1=1)。
特征1至特征2对应相应类型广告的特征值的情况如下表9所示:
特征 广告1 广告2 广告3
特征1 2 1 \
特征2 1 \ \
表9
可以看出,通过特征匹配到的携带该特征的非广告样本的数量,对特征匹配的携带该特征的广告样本的数量作除法处理,作为特征对应相应类型广告的特征值,能够达到这样的效果:在特征匹配的携带该特征的非广告样本的数量较多时的特征值,较特征匹配到的携带该特征的非广告样本的数量较少时的所具有的特征值更小,从广告样本和非广告样本两个维度的匹配情况计算特征值能够使特征值更加精确表征特征是广告特征的概率。
作为步骤1033b的另一个示例,计算特征匹配到的携带不同类型广告的广告样本的数量与加和的比值,确定为特征对应相应类型广告的特征值。
特征针对不同类型的广告而存在对应的特征值,也就是说特征的特征值与广告类型是一一对应的,对于特征对应一个类型广告的特征值来说,通过计算特征所匹配到的携带该类型广告的样本的数量与加和的比值得到,其中,加和为以下数量的加和:特征匹配到的携带该类型广告的广告样本的数量,也就是样本集合中携带特征的广告样本的数量;特征匹配的 未携带该类型广告的非广告样本的数量。
可以理解地,由于特征值表示特征匹配到的一个类型广告的广告样本的数量与特征匹配到的全部样本(广告样本和非广告样本)的数量的比值,特征值也可以视为表示特征对携带该特征的广告的命中率,当然,需要对特征值来进行归一化处理。
以特征1对应广告1的特征值为例,首先计算样本集合中携带特征1的广告样本的数量,以及样本集合中携带特征1的非广告样本的数量,计算上述两个数量的加和,然后计算样本集合中携带特征1的广告样本的数量与加和的比值。
例如,结合图5-1,对于前述的特征1对应广告1的特征值而言,匹配到的携带广告1的广告样本的数量为2,未匹配携带广告3的广告样本3,匹配到的携带特征1的非广告样本的数量为1(对应非广告样本4),则特征1对应广告1的特征值也就是命中率记为P(对应广告1),P(对应广告1)为2/(1+2)=67%。
再例如,对于前述的特征1对应广告2的特征值而言,特征1匹配到携带广告2的广告样本的数量为1(对应广告样本2),未匹配到携带广告3的广告样本3,匹配到的携带特征2的非广告样本的数量为1(对应非广告样本5),则特征1对应广告1的特征值也就是命中率P(对应广告2),P(对应广告2)为1/(1+1)=50%。
同样,结合图5-2,对于前述的特征2对应广告3的特征值而言,确定特征2匹配到的携带广告3的广告样本的数量为1(对应广告样本3),未匹配携带广告1以及广告2的广告样本,匹配到的携带特征2的非广告样本的数量为1(非广告样本5),则特征2对应广告2的特征值P(广告2)为两者的比值1/(1+1)=50%。
特征1至特征2对应相应类型广告的特征值也就是命中率的情况如下 表10所示:
特征 广告1 广告2 广告3
特征1 67% 50% \
特征2 50% \ \
表10
根据上述记载不难理解,对于两个特征,设为特征a和特征b,假设特征a和特征b匹配的广告样本的数量相同,并且特征a匹配到的非广告样本的数量大于特征b匹配到的非广告样本的数量,由于特征a较特征b匹配的非广告样本的数量更多,那么就特征a和特征b比较而言,特征a是广告特征的概率小于特征b是广告特征的概率;
上述计算特征值的方式中,将特征匹配到的携带特征的广告样本的数量与加和(也就是特征匹配到的写达斯该特征的非广告样本的数量,与特征匹配到的携带该特征的广告样本的数量的加和)作除法处理,兼顾了特征匹配到广告样本和非广告样本的数量两个维度的信息,能够对特征a和特征b的特征值进行精确的区分,也就是说,从特征匹配到的携带该特征的广告样本的数量,特征匹配到的携带特征的全部样本(广告样本和非广告样本)的数量两个维度计算特征的特征值,能够使特征值更加精确。
步骤104,基于特征对应不同类型广告的特征值,在所提取的特征中筛选出不同类型广告的广告特征。
通过步骤101至步骤103可以得到由多个特征形成的集合(称为特征集)、以及每个特征对应不同类型广告的特征值,如图6所示,结合不同的筛选策略选筛选出对应不同类型广告的广告特征并形成广告特征库,下面结合图6示出的不同的筛选策略筛选出不同类型广告的广告特征进行说明。
筛选策略1)广告与特征一一对应
针对每个类型广告,确定仅匹配到携带该类型广告的广告样本的特征 为相应类型广告的广告特征,在仅匹配到包括该类型广告的广告样本的特征为多个时,选取特征值最大的特征为该类型广告的广告特征。
如前所述,特征所匹配到的广告样本中可能包括多个类型广告,则该特征对应多个类型广告具有相应的特征值,例如前述的特征1对应广告1、以及广告2均具有相应的特征值;特征所匹配到的广告样本中可能包括一个类型广告,则该特征对应这一个类型广告具有相应的特征值,例如前述的特征2仅对应广告2、特征2具有对应广告2的特征值。
在方式1)中,对于一个类型的广告来说,如果携带该广告的广告样本包括中包括多个特征,那么,在上述多个特征中,优先选取所匹配到的广告样本中仅包括该类型广告的特征作为该类型广告的广告特征,也就是优先选取与广告具有一一对应关系的特征作为该广告的广告特征。
例如,对于前述的广告1,特征1和特征2所匹配到的广告样本均携带广告1,此外,特征1还匹配到携带广告3的广告样本1,特征2匹配了仅携带广告3的广告样本2,特征2与广告3形成一一对应关系,优先选取特征2作为广告3的广告特征。
筛选策略2)最大特征值
针对每个特征,当特征匹配到包括多个类型广告的广告样本时,基于特征对应多个类型广告中每个类型广告的特征值,将特征确定为对应最大特征值相应类型广告的特征值。
如前所述,对于样本集合中广告样本所包括的每个类型的广告来说,至少有一个特征匹配包括该类型广告的广告样本,此时特征与广告形成一一对应的关系,方式2)对应特征与广告形成一对多的关系也就是一个特征匹配包括多个类型广告的广告样本时,筛选特征值的处理进行说明。
如果一个特征匹配到携带多个类型广告的广告样本,则比较该特征对应多个类型广告(也就是该特征所匹配的广告样本中所包括的不同类型的 广告)中每个类型广告的特征值,如果特征对应一个类型广告的特征值越大,则特征是该类型广告的广告特征的概率就越大,因此通过比较得到特征对于哪个类型广告具有最大的特征值,就将该特征作为该类型广告的特征值。
例如,对于前述的特征1匹配了携带广告1和广告2的广告样本,则比较特征1对应广告1的特征值(67%)与特征1对应广告2的特征值(50%),特征1具有最大特征值(67%)对应的类型广告广告1,因此将特征1作为广告1的广告特征。
筛选策略3)高概率特征
对于特征针对不同类型广告的特征值,选取特征值超出特征值阈值的特征作为相应类型广告的广告特征,也就是选取匹配一类型广告的高概率的特征为相应类型广告的广告特征。
由于特征对应任一类型广告的特征值表征了该特征匹配该类型广告的概率也就是命中率正相关,因此,对于一个类型广告来说,如果有多个特征匹配到携带该类型广告的广告样本,则比较每个特征对应该类型广告的特征值是否超出特征值阈值,如果超出则将该特征作为相应类型广告的广告特征,相应地,对个一个类型广告可以确定至少一个对应的广告特征。
例如,对于前述的特征1匹配了携带广告1和广告2的广告样本,特征1对应广告1的特征值为67%,与特征对应广告2的特征值为50%,如果特征值阈值为60%,则特征1为广告1的广告特征。
方式4)是否是第三方插件代码特征
筛除所提取出的特征中对应第三方插件代码的特征;其中,第三方插件代码包括:支付插件和帐号登录软件开发套件,通过在特征中筛除第三方插件代码的特征,可以有效避免因应用中所必须的一些功能如支付、登录等而造成误检测出样本中携带广告的情况。
以上筛选特征值的策略可以在实际应用灵活选用,例如只选用一种,或者在不冲突的情况下结合几种来筛选。以方式1)和方式3)相结合为例,如果出现这种情况,多个特征匹配了仅包括一个类型广告的广告样本,也就是多个特征均与同一个类型广告具有一一对应关系时,在这多个特征中选取对应该类型广告的特征值最大的特征作为该类型广告的广告特征。
例如,对于前述的广告2,特征2与广告3一一对应,并且特征2针对广告3的广告样本的命中率是50%,如果还存在特征3与广告3一一对应且特征3针对携带广告3的广告样本的命中率是30%,则优选选取特征2作为广告3的广告特征。
步骤105,将不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功则确定待检测样本携带广告,且所携带比对成功的广告特征所对应类型的广告。
在一个实施方式中,对待检测的样本是否携带广告进行检测之后,可以将检测后的样本及其包括的广告的类型的更新至样本集合作为新的广告样本,对与检测后确定的不携带广告的样本可以更新至样本集合中作为新的非广告样本,基于新的广告样本(或者,基于新的广告样本和新的非广告样本)对特征重新进行筛选以更新广告特征库中的广告特征,这样,可以根据待检测的样本同步更新广告特征,无需人工对新的样本进行判断(是否携带广告),即可对样本中新出现的广告进行准确检测,而且提升了检测广告的效率。
对实施上述广告检测方法的广告检测装置进行说明,在图7示出的广告检测装置100的一个可选的结构示意图中,包括提取模块101、检测模块102和广告特征库模块103。
提取模块101配置为获取携带广告的广告样本组成的样本集合,提取集合中的广告样本的特征(步骤301)并上报至检测模块102。
检测模块102确定特征所匹配的广告样本中携带的不同类型广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;基于所述特征对应不同类型广告的特征值,在所提取的特征中筛选出不同类型广告所具有的广告特征,将广告特征存储至广告特征库模块103(步骤302)。
在一个实施方式中,结合图8,提取模块101确定特征所匹配的广告样本所包括的不同类型广告;针对每个类型的广告,确定所述特征匹配到携带相应类型广告的广告样本的数量,基于数量确定特征对应相应类型广告的特征值,确定特征值包括以下几种方式:
方式1)将特征匹配到的携带相应类型广告的广告样本的数量,确定为特征对应相应类型广告的特征值;
方式2)基于所述特征在所述样本集合中匹配到的所述非广告样本的数量、以及所述特征匹配到的携带相应类型广告的广告样本的数量,确定所述特征对应相应类型广告的特征值;
方式3)将所述特征匹配包括相应类型广告的广告样本的数量与加和的比值确定为所述特征对应相应类型广告的特征值。
上述的加和为所述特征匹配所述样本集合中所述非广告样本的数量、与所述特征匹配包括相应类型广告的广告样本的数量的加和,因此,方式3)中对应一个类型广告的特征值可以视为特征匹配相应类型广告的命中率。
就方式3)举例来说,计算所述特征匹配到的携带相应类型广告的广告样本的数量、与所述特征匹配到的所述非广告样本的数量的比值,得到所述特征对应相应类型广告的特征值;或者,计算所述特征匹配到的携带相应类型广告的广告样本的数量与加和的比值,得到所述特征对应相应类型广告的特征值,所述加和为所述特征匹配到的所述非广告样本的数量、与所述特征匹配到的携带相应类型广告的广告样本的数量的加和。
在需要确定待检测样本是否携带广告时,从广告特征库模块103提取不同类型广告的广告特征(步骤303),利用不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功则确定待检测样本包括比对成功的广告特征所对应类型的广告,输出对应待检测样本的检测结果(步骤304)。
结合图9,检测模块102可以采用多种策略从提取的特征中筛选出对应不同类型广告的广告特征,以下对多种策略进行说明,在不冲突的情况下多种策略可以结合使用。
1)针对每个类型广告,确定仅匹配到携带相应类型广告的广告样本的特征为相应类型广告的广告特征;其中,
在仅匹配包括相应类型广告的广告样本的特征为多个时,选取特征值最大的特征为相应类型广告的广告特征。
2)针对每个所述特征,当所述特征匹配到携带多个类型广告的广告样本时,确定所述特征具有最大特征值时所对应类型的广告,将所述特征确定为相应类型广告的特征值。
3)基于所述特征对应不同类型广告的特征值,选取特征值超出特征值阈值的特征为相应类型广告的广告特征。
与方式1)结合使用时,即使一个特征与一个类型的广告是一一对应的,但如果该特征对应相应类型广告的特征值没有超出特征值阈值,表征该特征是相应类型广告的特征的概率较小,不具有作为广告特征的意义。
上述三种方式都可以与下述方式4)结合使用:筛除所提取出的特征中的对应第三方插件代码的特征,其中,第三方插件代码包括:支付插件和帐号登录软件开发套件的,避免对应用中因内置的一些必要的功能而被误检测为内置广告的情况。
在一个实施方式中,当检测模块102对待检测样本检测完毕后,提取 模块101根据待检测样本的检测结果(待检测样本是否携带广告,以及携带广告的类型)更新样本集合:在待检测样本携带广告时将待检测样本作为新的广告样本,在待检测样本未携带广告时将待检测样本作为新的非广告样本。从更新的样本集合提取广告样本的特征,由检测模块102从提取的特征中重新筛选出不同类型广告的广告特征并更新广告特征库模块103。利用待检测样本来对广告特征库103中的广告特征进行迭代更新,这样只要在待检测样本中包括新的广告,即可自动提取出对应新广告的广告特征,从而实现对包括新广告的样本的准确检测。
广告检测装置检测广告的一个可选的流程示意图如图11所示,提取模块101获取样本集合(步骤401,携带广告样本,还可以包括非广告集合),结合图8,提取模块101通过后台引擎从样本集合中提取广告样本的特征(步骤402),如Opcode特征,或从广告样本的代码中提取的函数流的特征。
结合图9,检测模块102从提取出的特征中筛选出对应不同类型广告的特征也就是广告特征(步骤403)并存储广告特征库模块103。
结合图10,在需要确定待检测的样本是否携带广告时,检测模块102从检测的样本中提取特征(步骤404),在广告特征库模块103中查询是否有对应的广告特征,如果有则确定待检测广告包括有相应类型的广告,否则判定待检测样本不携带广告,输出对应待检测样本的检测结果(步骤405)。
实际应用中,上述模块可以以服务器或服务器集群运行用于广告检测的可执行程序的方式实现。以服务器集群的实现方式为例,在图12示出的广告检测装置的一个可选的拓扑示意图中,提取模块101实施为收集服务器200,检测模块102实施为检测服务器300,广告特征库模块103实施为广告特征库400。收集服务器从样本集合的广告样本中提取特征并上报提取的特征至检测服务器(步骤301),
在图13示出的广告检测装置实施为广告特征库400、检测服务器300和收集服务器200的一个应用示例中,对于应用(待检测样本)是否携带广告进行检测之后,将检测结果连同应用一起在应用平台500发布,终端600用户在需要向应用平台500请求安装软件时,向终端600用户提醒应用中内置有广告,提示用户是否继续安装,在终端用户选择不安装时,可以向用户推荐该应用的未内置广告的版本,以方便用户安装使用。
对于图13示出的服务器,如广告特征库400、检测服务器300和收集服务器200的一个可选的结构示意图中,包括:处理器410、输入/输出接口430(例如显示器、键盘、触摸屏、扬声器麦克风中的一个或多个),存储介质440以及网络接口420(支持网络协议进行通信的各种形式的接口,如以太网接口),组件可以经***总线450连接通信。
存储介质440可以为只读存储器(ROM,Read-Only Memory)、闪存(Flash)存储器、转移装置等、磁存储介质(例如,磁带、磁盘驱动器等)、光学存储介质(例如,光盘、硬盘、纸卡、纸带等)以及其他熟知类型的程序存储器;存储介质中存储有可执行程序,当执行可执行程序时,引起服务器中的处理器410执行包括以下的操作:提取样本集合中每个广告样本所具有的特征;确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:确定所述特征所匹配的广告样本中所携带的不同 类型的广告;针对每个类型的广告,确定所述特征匹配到携带相应类型广告的广告样本的数量,基于所述数量确定所述特征对应相应类型广告的特征值。
在一个实施方式中,样本集合还包括非广告样本;当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:将特征匹配到的携带相应类型广告的广告样本的数量,确定为所述特征对应相应类型广告的特征值;或者,基于所述特征在所述样本集合中匹配到的所述非广告样本的数量、以及所述特征匹配到的携带相应类型广告的广告样本的数量,确定所述特征对应相应类型广告的特征值。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:计算所述特征匹配到的携带相应类型广告的广告样本的数量、与所述特征匹配到的所述非广告样本的数量的比值,得到所述特征对应相应类型广告的特征值;或者,计算所述特征匹配到的携带相应类型广告的广告样本的数量与加和的比值,得到所述特征对应相应类型广告的特征值,所述加和为所述特征匹配到的所述非广告样本的数量、与所述特征匹配到的携带相应类型广告的广告样本的数量的加和。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:针对每个所述类型广告,确定仅匹配到携带相应类型广告的广告样本的特征为相应类型广告的广告特征;其中,在仅匹配包括相应类型广告的广告样本的特征为多个时,选取特征值最大的特征为相应类型广告的广告特征。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:针对每个所述特征,当所述特征匹配到携带多个类型广告的广告样本时,确定所述特征具有最大特征值时所对应类型的广告,将所述特征确定为相应类型广告的特征值。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:基于所述特征对应不同类型广告的特征值,选取特征值超出特征值阈值的特征为相应类型广告的广告特征。
在一个实施方式中,当执行可执行程序时,还引起服务器中的处理器410执行包括以下的操作:确定所述特征对应不同类型广告所具有的特征值之前,筛除所提取出的所述特征中对应第三方插件代码的特征;其中,所述第三方插件代码包括:支付插件和帐号登录软件开发套件;利用所述待检测样本更新所述样本集合,并基于更新后的所述样本集合重新确定所述不同类型广告所具有的广告特征。
综上所述,本发明实施例具有以下有益效果:
通过自动提取特征(包括有广告特征和非广告特征)并基于特征的特征值对特征进一步筛选得到广告特征的方式,整个过程不涉及人工提取广告特征的处理,这样,只要在样本集合中添加具有新广告的广告样本即可自动确定对应新广告的广告特征,从而能够高效更新广告特征的技术效果,进而基于自动快速更新广告特征能够对包括新广告的待检测样本进行准确检测。
本领域的技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一 个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、RAM、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种广告检测方法,包括:
    提取样本集合中每个广告样本所具有的特征;
    确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
    基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
    利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
  2. 如权利要求1所述的方法,其中,所述确定所述特征对应不同类型广告所具有的特征值,包括:
    确定所述特征所匹配的广告样本中所携带的不同类型的广告;
    针对每个类型的广告,确定所述特征匹配到携带相应类型广告的广告样本的数量,基于所述数量确定所述特征对应相应类型广告的特征值。
  3. 如权利要求2所述的方法,其中,所述样本集合还包括未携带有广告的非广告样本;
    所述基于所述数量确定所述特征对应相应类型广告的特征值,包括:
    将所述特征匹配到的携带相应类型广告的广告样本的数量,确定为所述特征对应相应类型广告的特征值;
    或者,
    基于所述特征在所述样本集合中匹配到的所述非广告样本的数量、以及所述特征匹配到的携带相应类型广告的广告样本的数量,确定所述特征对应相应类型广告的特征值。
  4. 如权利要求3所述的方法,其中,所述基于所述特征在所述样本集合中匹配到的所述非广告样本的数量、以及所述特征匹配到的携带相应类型广告的广告样本的数量,确定所述特征对应相应类型广告的特征值,包括:
    计算所述特征匹配到的携带相应类型广告的广告样本的数量、与所述特征匹配到的所述非广告样本的数量的比值,得到所述特征对应相应类型广告的特征值;
    或者,
    计算所述特征匹配到的携带相应类型广告的广告样本的数量与加和的比值,得到所述特征对应相应类型广告的特征值,所述加和为所述特征匹配到的所述非广告样本的数量、与所述特征匹配到的携带相应类型广告的广告样本的数量的加和。
  5. 如权利要求1所述的方法,其中,所述基于广告的特征值,在所提取的所述特征中筛选出所述广告样本中携带的不同类型广告的广告特征,包括:
    针对每个所述类型广告,确定仅匹配到携带相应类型广告的广告样本的特征为相应类型广告的广告特征;其中,
    在仅匹配包括相应类型广告的广告样本的特征为多个时,选取特征值最大的特征为相应类型广告的广告特征。
  6. 如权利要求1所述的方法,其中,所述基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征,包括:
    针对每个所述特征,当所述特征匹配到携带多个类型广告的广告样本时,确定所述特征具有最大特征值时所对应类型的广告,将所述特征确定为相应类型广告的特征值。
  7. 如权利要求1所述的方法,其中,所述基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征,包括:
    基于所述特征对应不同类型广告的特征值,选取特征值超出特征值阈值的特征为相应类型广告的广告特征。
  8. 如权利要求1所述的方法,其中,所述方法还包括:
    确定所述特征对应不同类型广告所具有的特征值之前,筛除所提取出的所述特征中对应第三方插件代码的特征;其中,
    所述第三方插件代码包括:支付插件和帐号登录软件开发套件;
    所述方法还包括:
    利用所述待检测样本更新所述样本集合,并基于更新后的所述样本集合重新确定所述不同类型广告所具有的广告特征。
  9. 一种广告检测装置,包括:
    提取模块,配置为提取样本集合中每个广告样本所具有的特征;
    检测模块,配置为确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
    所述检测模块,还配置为基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
    所述检测模块,还配置为利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
  10. 如权利要求9所述的广告检测装置,其中,
    所述检测模块,还配置为确定所述特征所匹配的广告样本中所携带的不同类型的广告;
    所述检测模块,还配置为针对每个类型的广告,确定所述特征匹配到携带相应类型广告的广告样本的数量,基于所述数量确定所述特征对应相应类型广告的特征值。
  11. 如权利要求10所述的广告检测装置,其中,
    所述样本集合还包括未携带有广告的非广告样本;
    所述检测模块,还配置将所述特征匹配到的携带相应类型广告的广告样本的数量,确定为所述特征对应相应类型广告的特征值;
    或者,
    基于所述特征在所述样本集合中匹配到的所述非广告样本的数量、以及所述特征匹配到的携带相应类型广告的广告样本的数量,确定所述特征对应相应类型广告的特征值。
  12. 如权利要求11所述的广告检测装置,其中,
    所述检测模块,还配置为计算所述特征匹配到的携带相应类型广告的广告样本的数量、与所述特征匹配到的所述非广告样本的数量的比值,得到所述特征对应相应类型广告的特征值;
    所述检测模块,还配置为计算所述特征匹配到的携带相应类型广告的广告样本的数量与加和的比值,得到所述特征对应相应类型广告的特征值,所述加和为所述特征匹配到的所述非广告样本的数量、与所述特征匹配到的携带相应类型广告的广告样本的数量的加和。
  13. 如权利要求9所述的广告检测装置,其中,
    所述检测模块,还针对每个所述类型广告,确定仅匹配到携带相应类型广告的广告样本的特征为相应类型广告的广告特征;其中,
    在仅匹配包括相应类型广告的广告样本的特征为多个时,选取特征值最大的特征为相应类型广告的广告特征。
  14. 如权利要求9所述的广告检测装置,其中,
    所述检测模块,还配置为针对每个所述特征,当所述特征匹配到携带多个类型广告的广告样本时,确定所述特征具有最大特征值时所对应类型的广告,将所述特征确定为相应类型广告的特征值。
  15. 如权利要求13或14所述的广告检测装置,其中,
    所述检测模块,还配置为基于所述特征对应不同类型广告的特征值,选取特征值超出特征值阈值的特征为相应类型广告的广告特征。
  16. 如权利要求9所述的广告检测装置,其中,
    所述检测模块,还配置为确定所述特征对应不同类型广告所具有的特征值之前,筛除所提取出的所述特征中的对应第三方插件代码的特征;其中,所述第三方插件代码包括:支付插件和帐号登录软件开发套件;
    所述提取模块,还配置为利用所述待检测样本更新所述样本集合;
    所述检测模块,还配置为基于更新后的所述样本集合重新确定所述不同类型广告所具有的广告特征。
  17. 一种广告检测装置,包括:
    存储器,配置为存储可执行程序;
    处理器,配置为通过执行所述存储器中存储的可执行程序时实现以下操作:
    提取样本集合中每个广告样本所具有的特征;
    确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
    基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
    利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告 特征所对应类型的广告。
  18. 一种存储介质,存储有可执行程序,所述可执行程序被处理器执行时实现以下操作:
    提取样本集合中每个广告样本所具有的特征;
    确定所述特征所匹配的广告样本中携带的不同类型的广告,并确定所述特征对应不同类型的广告所具有的特征值,所述特征值表征所述特征是相应类型广告的广告特征的概率;
    基于所述特征对应不同类型广告的特征值,在所提取的所述特征中筛选出所述不同类型广告的广告特征;
    利用所述不同类型广告的广告特征与从待检测样本提取出的特征进行比对,比对成功时确定所述待检测样本携带广告,且携带比对成功的广告特征所对应类型的广告。
PCT/CN2017/082069 2016-05-03 2017-04-26 广告检测方法及广告检测装置、存储介质 WO2017190617A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/030,749 US11334908B2 (en) 2016-05-03 2018-07-09 Advertisement detection method, advertisement detection apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610288674.5 2016-05-03
CN201610288674.5A CN105912935B (zh) 2016-05-03 2016-05-03 广告检测方法及广告检测装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/030,749 Continuation US11334908B2 (en) 2016-05-03 2018-07-09 Advertisement detection method, advertisement detection apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2017190617A1 true WO2017190617A1 (zh) 2017-11-09

Family

ID=56753285

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/082069 WO2017190617A1 (zh) 2016-05-03 2017-04-26 广告检测方法及广告检测装置、存储介质

Country Status (3)

Country Link
US (1) US11334908B2 (zh)
CN (1) CN105912935B (zh)
WO (1) WO2017190617A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912935B (zh) * 2016-05-03 2019-06-14 腾讯科技(深圳)有限公司 广告检测方法及广告检测装置
CN108898165B (zh) * 2018-06-12 2021-11-30 浙江大学 一种平面广告风格的识别方法
CN112084502B (zh) * 2020-09-18 2024-06-21 珠海豹趣科技有限公司 一种软件识别方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984450A (zh) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 恶意代码检测方法和***
CN103226583A (zh) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 一种广告插件识别的方法和装置
US20150088662A1 (en) * 2012-10-10 2015-03-26 Nugg.Ad Ag Predictive Behavioural Targeting
CN105468975A (zh) * 2015-11-30 2016-04-06 北京奇虎科技有限公司 恶意代码误报的追踪方法、装置及***
CN105512558A (zh) * 2016-01-07 2016-04-20 北京邮电大学 一种基于反编译模块特征的android广告插件检测方法
CN105912935A (zh) * 2016-05-03 2016-08-31 腾讯科技(深圳)有限公司 广告检测方法及广告检测装置

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584287B2 (en) * 2004-03-16 2009-09-01 Emergency,24, Inc. Method for detecting fraudulent internet traffic
US20070061211A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Preventing mobile communication facility click fraud
CN101046873A (zh) * 2006-03-29 2007-10-03 腾讯科技(深圳)有限公司 一种在网络游戏中发布网络广告的方法及装置
US7657626B1 (en) * 2006-09-19 2010-02-02 Enquisite, Inc. Click fraud detection
JP3930040B1 (ja) * 2006-09-01 2007-06-13 株式会社イオレ 広告決定システム
US20080147456A1 (en) * 2006-12-19 2008-06-19 Andrei Zary Broder Methods of detecting and avoiding fraudulent internet-based advertisement viewings
US20080162202A1 (en) * 2006-12-29 2008-07-03 Richendra Khanna Detecting inappropriate activity by analysis of user interactions
US20080249832A1 (en) * 2007-04-04 2008-10-09 Microsoft Corporation Estimating expected performance of advertisements
US20080270154A1 (en) * 2007-04-25 2008-10-30 Boris Klots System for scoring click traffic
US20090070219A1 (en) * 2007-08-20 2009-03-12 D Angelo Adam Targeting advertisements in a social network
US8135615B2 (en) * 2007-12-18 2012-03-13 Amdocs Software Systems Limited Systems and methods for detecting click fraud
US7991715B2 (en) * 2007-12-27 2011-08-02 Arbor Labs, Inc. System and method for image classification
JP2009171271A (ja) * 2008-01-17 2009-07-30 Nec Corp 広告配信システム及び配信方法
US8639570B2 (en) * 2008-06-02 2014-01-28 Microsoft Corporation User advertisement click behavior modeling
US20110131652A1 (en) * 2009-05-29 2011-06-02 Autotrader.Com, Inc. Trained predictive services to interdict undesired website accesses
US20110029393A1 (en) * 2009-07-09 2011-02-03 Collective Media, Inc. Method and System for Tracking Interaction and View Information for Online Advertising
US20110047006A1 (en) * 2009-08-21 2011-02-24 Attenberg Joshua M Systems, methods, and media for rating websites for safe advertising
EP2609537A1 (en) * 2010-08-26 2013-07-03 Verisign, Inc. Method and system for automatic detection and analysis of malware
US20120166276A1 (en) * 2010-12-28 2012-06-28 Microsoft Corporation Framework that facilitates third party integration of applications into a search engine
US9147199B2 (en) * 2011-06-17 2015-09-29 Google Inc. Advertisements in view
US8930940B2 (en) * 2011-08-19 2015-01-06 Yongyong Xu Online software execution platform
ES2755780T3 (es) * 2011-09-16 2020-04-23 Veracode Inc Análisis estático y de comportamiento automatizado mediante la utilización de un espacio aislado instrumentado y clasificación de aprendizaje automático para seguridad móvil
CN102419777B (zh) * 2012-01-10 2013-10-02 凤凰在线(北京)信息技术有限公司 一种互联网图片广告过滤***及其过滤方法
CN103310357A (zh) * 2012-03-13 2013-09-18 腾讯科技(深圳)有限公司 一种广告审核方法及***
US9589129B2 (en) * 2012-06-05 2017-03-07 Lookout, Inc. Determining source of side-loaded software
CN103580939B (zh) * 2012-07-30 2018-03-20 腾讯科技(深圳)有限公司 一种基于账号属性的异常消息检测方法及设备
CN103714063B (zh) * 2012-09-28 2017-08-04 国际商业机器公司 数据分析方法及其***
US9582584B2 (en) * 2013-04-23 2017-02-28 Tencent Technology (Shenzhen) Company Limited Method, apparatus and system for filtering data of web page
CN104598815B (zh) * 2013-10-30 2018-09-11 北京猎豹移动科技有限公司 恶意广告程序的识别方法、装置及客户端
CN104331396A (zh) * 2014-11-26 2015-02-04 深圳市英威诺科技有限公司 一种智能识别广告的方法
CN106156878B (zh) * 2015-04-21 2020-09-15 深圳市腾讯计算机***有限公司 广告点击率矫正方法及装置
EP3268918A4 (en) * 2015-04-27 2018-08-29 Albert Technologies Ltd. Auto-expanding campaign optimization
CN104883610B (zh) * 2015-04-28 2018-03-23 腾讯科技(北京)有限公司 贴片视频播放方法及装置
CN105046525A (zh) * 2015-06-30 2015-11-11 腾讯科技(北京)有限公司 广告投放***、装置及方法
CN106339208B (zh) * 2015-07-15 2019-12-13 腾讯科技(深圳)有限公司 一种多媒体信息弹窗的处理方法及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984450A (zh) * 2010-12-15 2011-03-09 北京安天电子设备有限公司 恶意代码检测方法和***
US20150088662A1 (en) * 2012-10-10 2015-03-26 Nugg.Ad Ag Predictive Behavioural Targeting
CN103226583A (zh) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 一种广告插件识别的方法和装置
CN105468975A (zh) * 2015-11-30 2016-04-06 北京奇虎科技有限公司 恶意代码误报的追踪方法、装置及***
CN105512558A (zh) * 2016-01-07 2016-04-20 北京邮电大学 一种基于反编译模块特征的android广告插件检测方法
CN105912935A (zh) * 2016-05-03 2016-08-31 腾讯科技(深圳)有限公司 广告检测方法及广告检测装置

Also Published As

Publication number Publication date
CN105912935B (zh) 2019-06-14
US20180322526A1 (en) 2018-11-08
CN105912935A (zh) 2016-08-31
US11334908B2 (en) 2022-05-17

Similar Documents

Publication Publication Date Title
US11188650B2 (en) Detection of malware using feature hashing
US8955120B2 (en) Flexible fingerprint for detection of malware
US11030311B1 (en) Detecting and protecting against computing breaches based on lateral movement of a computer file within an enterprise
TWI461952B (zh) 惡意程式偵測方法與系統
EP2743854A1 (en) Clustering processing method and device for virus files
US20150213042A1 (en) Search term obtaining method and server, and search term recommendation system
US20190065575A1 (en) Linking Single System Synchronous Inter-Domain Transaction Activity
KR101631242B1 (ko) 잠재 디리클레 할당을 이용한 악성 트래픽의 시그니처의 자동화된 식별 방법 및 장치
WO2017101301A1 (zh) 数据信息处理方法及装置
JP2015511047A (ja) マルウェアを検出するコンピューティングデバイス
CN103473346A (zh) 一种基于应用程序编程接口的安卓重打包应用检测方法
US9355250B2 (en) Method and system for rapidly scanning files
WO2017190617A1 (zh) 广告检测方法及广告检测装置、存储介质
WO2017084451A1 (zh) 识别恶意软件的方法和装置
CN105512555A (zh) 基于文件字符串聚类的划分同源家族和变种的方法及***
US20210342447A1 (en) Methods and apparatus for unknown sample classification using agglomerative clustering
WO2022247894A1 (zh) 直播间服务配置方法、装置、设备及介质
AU2016204573A1 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
US11222270B2 (en) Using learned application flow to predict outcomes and identify trouble spots in network business transactions
US9760711B1 (en) Detection of repackaged mobile applications
US20170300937A1 (en) System and method for inferring social influence networks from transactional data
EP2819054B1 (en) Flexible fingerprint for detection of malware
US11030673B2 (en) Using learned application flow to assist users in network business transaction based apps
WO2019210624A1 (zh) 特征筛选方法、装置、计算机设备和存储介质
US9110893B2 (en) Combining problem and solution artifacts

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17792443

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17792443

Country of ref document: EP

Kind code of ref document: A1