WO2017186090A1 - 通信号码处理方法及装置 - Google Patents

通信号码处理方法及装置 Download PDF

Info

Publication number
WO2017186090A1
WO2017186090A1 PCT/CN2017/081813 CN2017081813W WO2017186090A1 WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1 CN 2017081813 W CN2017081813 W CN 2017081813W WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1
Authority
WO
WIPO (PCT)
Prior art keywords
communication
processed
initiation
cdr
bill
Prior art date
Application number
PCT/CN2017/081813
Other languages
English (en)
French (fr)
Inventor
林海雄
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017186090A1 publication Critical patent/WO2017186090A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set

Definitions

  • the present application relates to data processing technologies in the field of communication technologies, and in particular, to a communication number processing method and apparatus.
  • Telecommunications fraud refers to the criminal act of criminals making false information, setting up scams, conducting remote and contactless fraud on the victim, and inducing the victim to make money or transfer money to criminals through telephone, internet and SMS.
  • the public security organs of the country established a total of 590,000 telecom fraud cases, an increase of 32.5% year-on-year, causing economic losses. 22.2 billion yuan; and behind each case, it may be a family broken by fraud.
  • the prior art collects the tag information of the user by using the application software (app) on the mobile phone. If a certain number is found to be simultaneously marked as a fraudulent number by multiple users, it is considered The number is a fraudulent number and alerts the user who is talking to the fraudulent number to be vigilant to avoid being scammed.
  • the prior art needs to collect user tag information.
  • the probability that the user marks the number is relatively low, and many users often do not need to mark the type of the number when a strange call is received, and the prior art needs After collecting enough user tags, the number can be considered as a fraudulent number. Therefore, the prior art fraudulent number is recognized slowly and inefficiently.
  • the user marks the number is subjective behavior, many When the user receives some harassing calls, such as advertisements and other malicious calls, the harassment number is often marked as a fraudulent number. Therefore, the prior art scam number identification accuracy is low.
  • the embodiment of the present application is expected to provide a communication number processing method and apparatus, which can provide The speed and accuracy of high number identification.
  • an embodiment of the present application provides a communication number processing method, where the method includes:
  • the parsing the CDR obtains the type of the communication information included in the CDR, extracts at least one type of communication information of each communication number in the CDR, and combines to form a pre-processed CDR ,include:
  • the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the ordering of the similarity extracts the first proportion of the communication initiation number with the highest similarity.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the communication initiation number of the second ratio with the highest number of communication times is extracted.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the communication initiation number of the third ratio having the highest average communication duration is extracted.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • Extracting a target pass signal matching the preset feature from the communication number included in the pre-processed bill Code including:
  • extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature including:
  • the machine learning model is used to analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
  • the method further includes:
  • the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  • the machine learning model is retrained based on the communication record of the security number in the pre-processed bill, including:
  • the method further includes:
  • responding to the communication behavior of the target communication number including: performing danger to a user having a communication response number of the communication record with the target communication number Reminder; wherein the danger reminder includes a voice reminder and/or a text reminder;
  • the real-time level of response processing is positively correlated with the level of danger.
  • the embodiment of the present application provides a communication number processing apparatus, where the apparatus includes:
  • An obtaining module configured to acquire, from the communication service device, a CDR of a preset number of communication numbers in a first preset time
  • a pre-processing module configured to parse the CDR to obtain a type of communication information included in the CDR, extract at least one type of communication information of each communication number in the CDR, and combine to form a pre-processed CDR ;
  • a parsing module configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
  • an extracting module configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
  • the pre-processing module is specifically configured to:
  • the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  • the parsing module is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the preprocessed bill; and obtain, according to the edit distance, each communication initiation in the preprocessed bill The similarity between the number and the yellow page number, wherein the edit distance indicates the number of operations of the yellow page number becoming the communication initiation number;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on the pre-processing words The ranking of the similarity of the yellow page numbers among the communication initiation numbers included in the single is extracted, and the communication initiation number of the first ratio with the highest similarity is extracted.
  • the parsing module is specifically configured to: extract a communication start time of each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill in a unit time Number of communications;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on the pre-processed CDR The number of communication times of each communication initiation number in the unit time is sorted, and the communication ratio number of the second ratio with the highest communication number is extracted.
  • the parsing module is specifically configured to: extract a communication duration of each communication number in the pre-processed bill as a communication initiation number; and calculate an average communication duration of each communication initiation number in the pre-processed bill;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on each communication included in the pre-processed CDR The order of the average communication duration of the originating number is extracted, and the third ratio of the communication originating number with the highest average communication duration is extracted.
  • the parsing module is specifically configured to: acquire a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the The order of the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR is extracted, and the communication initiation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
  • the extracting module is specifically configured to: analyze, by using a machine learning model, a feature of the corresponding type of communication information of each communication number in the pre-processed bill, from the communication number included in the pre-processed bill Extract the target communication number that matches the preset feature.
  • the device further includes:
  • a training module configured to receive feedback information of the user side for the target communication number, and determine the target Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of the target communication numbers that are fed back to the security number by the user side in the identified target communication number; the error rate of the machine learning model is greater than the fifth At the threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  • the training module is configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain at least one type of communication information of the security number. Having a feature; updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
  • the device further includes:
  • a response module configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature; and a matching degree between a feature of the corresponding type of communication information of the target communication number and a preset feature Determining a risk level of the target communication number; and responding to the communication behavior of the target communication number based on the risk level of the target communication number.
  • the embodiment of the present application obtains the characteristics of the corresponding type of communication information of each communication number by parsing the CDR of the preset number of communication numbers in the first preset time, and based on each The corresponding type of communication information of the communication number has characteristics for extracting the target communication number matching the preset feature from each communication number.
  • the communication number CDR is objective data maintained by the operator, and can be truly and completely reflected.
  • the embodiment of the present application uses the communication number CDR as the processing basis, which can improve the accuracy of the number identification.
  • the generation and maintenance process of the CDR generally does not require each user. The direct participation of the operator is responsible for the speed and efficiency of the communication number CDRs. Therefore, the embodiment of the present application can improve the speed and accuracy of the number identification.
  • FIG. 1 is a schematic diagram of an optional application scenario of a method for processing a communication number in an embodiment of the present application
  • FIG. 2 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 1 of the present application;
  • FIG. 3 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 2 of the present application;
  • FIG. 5 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 4 of the present application.
  • FIG. 6 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 5 of the present application.
  • FIG. 7 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 6 of the present application.
  • FIG. 8 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 7 of the present application.
  • FIG. 9 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 8 of the present application.
  • FIG. 10 is an optional schematic flowchart of a method for processing a communication number in Embodiment 9 of the present application.
  • FIG. 11 is an optional schematic diagram of a user application running on a user equipment in a state of receiving a user indication according to an embodiment of the present disclosure
  • FIG. 11b is an optional schematic diagram of a user application running on a user equipment in a text reminding state according to an embodiment of the present application
  • FIG. 12 is an optional structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • FIG. 13 is another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • FIG. 14 is still another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • the embodiment of the present application describes a communication number processing method.
  • FIG. 1 an optional application scenario of the communication number processing method in the embodiment of the present application, the user equipment 11 , the user equipment 12 , the user equipment 13 , and the network device 14 .
  • communication service device 15 (such as carrier gateway or enterprise gateway), communication service device 15, application background server 16 respectively access communication network (such as wireless network or wired network), communication service device 15 such as business support system (BSS, Business Support System) / An operation support system (OSS), or a telecommunication switch;
  • BSS business support system
  • OSS operation support system
  • the communication service device 15 is configured to provide a bill for a communication number;
  • the network device 14 is configured to provide service support for each user equipment accessing the communication network;
  • 16 is used to provide service support for the application; here, corresponding to the background server 16 of the application, the client of the application installed on the user equipment is also used to provide service support for the application;
  • the application may be a communication application, for example: Tencent Mobile phone housekeeper, WeChat, Tencent mailbox, etc.
  • applications are not limited to communication applications.
  • the application device does not specifically limit this; in the above scenario, the number of user equipments is at least one, and each user equipment is associated with at least one different communication number, for example, the user equipment 11 shown in FIG. 1 is associated with at least one communication.
  • the number A, the user equipment 12 is associated with at least one communication number B, and the user equipment 13 is associated with at least one communication number C.
  • the communication number A, the communication number B and the communication number C are different from each other. Applicable to the above scenario, the communication number that satisfies the preset condition is identified from the plurality of communication numbers.
  • the embodiment of the present application further describes a communication number processing apparatus, which can be used to execute the embodiment of the present application.
  • Communication number processing method; the communication number processing device can be implemented in various manners, for example, in a user device such as a smart phone, a fixed telephone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.) All components of the device, or all components of the device are implemented in a network device such as an enterprise gateway or a carrier gateway, or the components in the device are implemented in a coupled manner on the user device side or the network side, or the communication number processing device It can also be a client application or a background server of the user application. For example, when the user application is a Tencent mobile phone housekeeper, the corresponding communication number processing device can be a client or a background server of the Tencent mobile phone housekeeper.
  • the embodiment provides a communication number processing method, which can be applied to a scenario in which a communication number that satisfies a preset condition needs to be identified from multiple communication numbers, for example, identification of a network-wide number in a communication network, or The identification of the communication number to be identified, or the identification of the communication number for communicating with the current user;
  • the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; Data services (such as WeChat), this application is not limited to this.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 201 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • the communication service device may include a telecommunication support system device, such as a BSS/OSS, or a telecommunication switch; the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements; the communication number is not limited to the mobile phone number and fixed. a number or the like; the communication number may include, for example, all communication numbers in the communication network, or a communication number to be identified indicated by the user, or a communication number to be called with the current user; wherein the communication number indicated by the user, such as the user The communication number to be identified specified in the application running on the user equipment (such as the Tencent mobile phone housekeeper), or the user sends an indication message carrying the communication number to be identified to the operator server.
  • a telecommunication support system device such as a BSS/OSS, or a telecommunication switch
  • the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements
  • the communication number is not limited to the mobile phone number and fixed.
  • the implementation manner of obtaining the CDR of the preset number of communication numbers in the first preset time from the communication service device may be at least one of the following manners:
  • the communication service device When detecting the communication number of the current user, acquires the CDR of the communication number of the current user in the first preset time;
  • the communication service device obtains the CDR of the strange communication number within the first preset time.
  • step 202 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
  • the pre-processed CDRs are formed by using the communication numbers as a dimension, and the pre-processing CDRs are formed.
  • the communication information includes at least one type of communication information corresponding to at least one of the following communication numbers: the communication number is used as the calling number (such as the calling number in the voice service), and the communication number is used as the called number (such as in the voice service) The called number), the communication number is used as the information transmission number (such as the short message transmission number or the data transmission number in the data service), and the communication number is used as the information reception number (such as the short message receiving number or the data receiving number in the data service).
  • the pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, that is, the pre-processed CDR does not need to include all the information in the CDR; the data of the pre-processed CDR is Each communication number is used as an index to preprocess the data structure of the bill, for example:
  • Calling number 1 in the voice service communication information 1, communication information 2, ...;
  • Calling number 2 in the voice service communication information 3, communication information 4, ...;
  • SMS sending number 3 communication information 5, communication information 6, ...;
  • Data transmission number 4 in the data service communication information 7, communication information 8, ....
  • the pre-processing bills indexed by using each communication number as the calling number are shown in Table 1.
  • Table 1 For the data structure example in Table 1, the calling number, the called number, the communication start time, and the communication duration are shown here. (Second) is a partial example of the type of communication information included in the CDR.
  • Step 203 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 204 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 205, otherwise The process ends.
  • Step 205 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill are analyzed, and the characteristics of the corresponding type of communication information are matched with the preset features from the communication numbers included in the pre-processed bill.
  • the target communication number; the preset feature is, for example, a pre-set a priori value.
  • the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
  • the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
  • the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
  • the acquisition speed and efficiency of the communication number CDR are high.
  • the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
  • the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
  • This embodiment is based on the first embodiment, and specifically determines how to parse the CDR to obtain the type of the communication information included in the CDR, and extracts at least one type of communication information of each communication number in the CDR and combines to form a pre-processed CDR.
  • the scenario that proposes a solution to the technical solution.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 301 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 302 Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration.
  • the communication initiation number may include a communication number as a calling number (such as a calling number in a voice service), and a communication number (such as a short message transmission number or a data transmission number in a data service) as an information transmission number;
  • the communication response number of the number may include a communication number as the called number (such as the called number in the voice service), and a communication number (such as a short message receiving number or a data receiving number in the data service) as the information receiving number;
  • the type of communication information included in the CDR is not limited to the above-mentioned communication initiation number, the communication response number corresponding to the communication initiation number, the communication start time, the communication duration, etc., and the type of communication information can also be Including data traffic (upstream traffic and/or downstream traffic), communication location, service type, long-distance type, etc.; this application is not limited thereto.
  • Step 303 Extract at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number.
  • Step 304 Combine the extracted communication records of each communication initiation number to form a pre-processed bill.
  • the pre-processed CDR only includes at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include all the information in the CDR, which can reduce the workload of the communication number processing and improve Communication number processing efficiency.
  • the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
  • the CDRs shown in Table 2 are taken as an example.
  • the communication start time, service type, and communication initiation are used here.
  • the number, the communication response number, the communication place, the long distance type, and the communication duration (seconds) are partial examples of the types of communication information included in the CDR.
  • the communication number processing apparatus parses the CDR shown in Table 2 to obtain at least one of the following types of communication information included in the CDR: a communication initiation number; a communication response number corresponding to the communication initiation number; and a communication start time; Communication duration;
  • the communication number processing device extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number; where the communication record of each communication initiation number includes the communication number at the first At least one type of communication information within a preset time;
  • the communication records of the extracted communication initiation numbers are combined to form a pre-processed CDR; the pre-processed CDRs are statistically formed by using each communication number as a dimension, and the data structure (or display mode) in the pre-processed CDR is used for each communication number.
  • the index organization it is assumed that at least one type of communication information corresponding to each communication number is a communication initiation number to form a pre-processed bill, and the data structure of the pre-processed bill can be:
  • Communication initiation number 1 communication information 1, communication information 2, ...;
  • Communication initiation number 2 communication information 1, communication information 2, ...;
  • the pre-processing bill shown in Table 3 is obtained by performing the steps 202-204 on the basis of the bill shown in Table 2 by the communication number processing device;
  • the pre-processed CDRs are organized by indexing each communication initiation number.
  • Communication origination number Communication response number Communication start time Communication duration (seconds) 158xxxx0001 186xxxx0002 2016-01-15 15:32:42 134 158xxxx0001 186xxxx0007 2016-01-15 15:42:02 97 158xxxx0001 139xxxx0006 2016-01-15 15:48:02 123 158xxxx0001 187xxxx0002 2016-01-15 15:52:07 256 170xxxx0001 186xxxx0001 2016-01-15 15:39:02 15 170xxxx0001 180xxxx0007 2016-01-15 15:51:02 77 170xxxx0001 139xxxx0002 2016-01-16 10:26:02 --
  • Step 305 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 306 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 307, otherwise The process ends.
  • Step 307 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • This embodiment is directed to how to parse a CDR to obtain the type of communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • the CDR obtains at least one type of communication information included in the CDR, and extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number, and extracts each extracted
  • the communication record combination of the communication initiation number forms a pre-processed CDR, and the formed pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include the CDR All the information in the book can reduce the workload of number identification and improve the speed and efficiency of number identification.
  • the embodiment is based on the first embodiment, and the editing distance of the communication initiation number and the yellow page number is used as the feature of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
  • the communication number processing method includes the following steps:
  • the yellow page number can be one or more; the edit distance refers to the minimum number of editing operations required to convert the yellow page number into the communication initiation number, that is, by adding, reducing, modifying, and moving the number of the yellow page number into communication.
  • the number of operations for initiating a number in the scenario where the yellow page number is multiple, for each communication initiation number in the pre-processed bill, the edit distance of the communication initiation number and each yellow page number needs to be separately calculated.
  • At least one of the following methods may be used to obtain each communication in the pre-processed bill based on the edit distance.
  • Method 1 for each communication initiation number in the pre-processed bill, normalizing the calculated communication initiation number and the edit distance of each yellow page number to obtain the communication initiation number and each yellow page number. Similarity; further, the similarity of the communication initiation number to each yellow page number is sorted.
  • Method 2 for each communication initiation number in the pre-processed bill, calculating a ratio of the edit distance of the communication initiation number to the yellow page number and the preset distance, and calculating the similarity between the calculated ratio communication initiation number and the yellow page number; In a scenario where the yellow page number is multiple, the ratio of the edit distance of the communication initiation number to each yellow page number to the preset distance needs to be separately calculated.
  • the initial value of the first threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed bill; The similarity between the communication initiation number and the yellow page number is sorted; the communication initiation number of the target number is selected according to the order of decreasing similarity; and the communication initiation number of the selected communication initiation number having the smallest similarity with the yellow page number is similar Degree, determined as the initial value of the first threshold.
  • the first threshold can be continuously updated by training calculation according to actual needs.
  • the communication number processing device sorts the similarity between each communication initiation number and the yellow page number included in the pre-processed bill based on the similarity between each communication initiation number and the yellow page number in the pre-processed bill;
  • the order of the similarity between each communication initiation number and the yellow page number included in the pre-processing CDR is extracted from the communication initiation numbers included in the pre-processed CDR, and the first proportional communication initiation number with the highest similarity is extracted as the target communication number.
  • the communication number processing device determines, according to the similarity between the communication initiation number and the yellow page number and the first threshold, respectively.
  • the probability that the communication initiation number belongs to the target communication number (such as the fraud number) and the probability of belonging to the normal number category, and the class corresponding to the larger probability value is used as the class to which the communication initiation number belongs;
  • the class is the target communication number class, and it is determined that the communication initiation number is the target communication number, and vice versa, the communication initiation number is determined to be the normal number.
  • the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, or a wearable device.
  • the device may be, for example, a service server of an operator, an enterprise gateway, a background server of an application installed on the user equipment, and the like;
  • the communication service device may be, for example, a BSS/OSS or a telecommunication switch;
  • it can be a communication application, for example, a Tencent mobile phone housekeeper, a WeChat, a Tencent mailbox, etc., of course, the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application;
  • the server and the communication service device cooperate with each other to implement an optional flowchart of the communication number processing method provided by the embodiment, and the method includes:
  • Step 401 The user equipment sends an identification indication carrying the to-be-identified communication number to the server, based on the user indication.
  • the user application running on the user equipment is in the receiving user indication state, and the user inputs the to-be-identified communication number in the designated location according to the prompt of the application in the display window of the application installed in the user equipment;
  • the number can be one or more.
  • Step 402 The server receives the identification indication, and sends a CDR request carrying the to-be-identified communication number to the communication service device according to the identification indication.
  • the CDR request includes the to-be-identified communication number and the first preset time.
  • Step 403 The communication service device receives the CDR request, and obtains the CDR of the to-be-identified communication number in the first preset time based on the CDR request, and sends the CDR to the server.
  • Step 404 The server receives the bill of the to-be-identified communication number in the first preset time.
  • step 405 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number to be identified in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 406 Calculate an edit distance of each to-be-identified communication number and a yellow page number in the pre-processed bill.
  • Step 407 Obtain a similarity between each to-be-identified communication number and the yellow page number in the pre-processed bill based on the edit distance.
  • Step 408 Determine whether the similarity between each to-be-identified communication number and the yellow page number included in the pre-processed CDR is greater than a first threshold. If yes, go to step 409, otherwise the process terminates.
  • Step 409 Extract, from the to-be-identified communication numbers included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than the first threshold, as the target communication number.
  • Step 410 The server sends an identification response carrying the target communication number to the user equipment based on the identified target communication number, where the identification response is used to perform a dangerous reminder to the user, and the user is reminded that the identified target communication number may be a fraudulent number;
  • Implementation methods include, but are not limited to, reminding by communication applications such as SMS, Flash, WeChat, and Tencent Mobile Manager; the server can also perform dangerous reminding to the user equipment directly through the customer service phone when the target communication number is identified.
  • the server may also perform a danger reminder to the user who has the communication response number of the communication record of the identified target communication number or the user who is communicating with the identified target communication number. To avoid users being cheated.
  • the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user application running on the user equipment is in a text reminding state, and the user equipment is installed on the user equipment.
  • the display window of the application of the user equipment displays, for example, the following text reminder message "Please be vigilant!
  • the target communication number is a fraudulent number
  • the user applications here include, but are not limited to, SMS, Flash, WeChat, Tencent mobile butler, and other communication applications;
  • the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • pre-processing the CDRs on the basis of the parsing, the editing distances of the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are respectively calculated, and the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are obtained based on the editing distance.
  • the degree of similarity (that is, the communication number is one of the characteristics of the communication initiation number), and the communication initiation number whose degree of similarity with the yellow page number is greater than the preset first threshold is extracted from each communication initiation number included in the pre-processed bill as The destination communication number, or the first generation of the communication initiation number with the highest degree of similarity is extracted as the target communication number based on the ranking of the similarity of the yellow number of each of the communication initiation numbers included in the pre-processing CDR;
  • the similarity between each communication initiation number and the yellow page number in the pre-processed bill is characterized by a first threshold
  • the preset feature by determining the relative relationship between the similarity between each communication initiation number and the yellow page number included in the pre-processed CDR and the first threshold, extracting the target communication that matches the preset feature from the communication number included in the pre-processed CDR The number enables fast and accurate number identification.
  • the embodiment is based on the first embodiment, and the communication number of the communication initiation number in the unit time is taken as the characteristic of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
  • the communication number processing method provided includes the following steps:
  • the number of communications of the communication initiation number in a unit time may include any of the following:
  • Mode 1 the communication initiation number and the number of communication of the same number in the unit time;
  • Mode 2 The number of communication times between the communication initiation number and all communication numbers with which it communicates.
  • the initial value of the second threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed CDR; and each communication initiation number in the unit time Sorting the number of communication times; selecting the communication initiation number of the target number in the order of decreasing the number of communication times per unit time; corresponding to the communication initiation number of the selected communication initiation number having the smallest number of communication times per unit time The number of communications in a unit time is determined as the initial value of the second threshold.
  • the second threshold can be continuously updated by training calculation according to actual needs.
  • the communication number processing device sorts the communication times of each communication initiation number included in the pre-processed bill in the unit time based on the number of communication times of each communication initiation number in the pre-processed bill. And sorting the number of communication times of each communication initiation number included in the pre-processing CDR in a unit time, and extracting, from each communication initiation number included in the pre-processed CDR, a communication ratio number of the second ratio with the highest communication number as the target Communication number.
  • the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.).
  • the server may be, for example, a service server of an operator, an enterprise gateway, a back-end server installed in an application of the user equipment, or the like;
  • the communication service device may be, for example, a BSS/OSS or a telecommunication switch; and the application may specifically be a communication application, for example, a Tencent mobile phone.
  • the appliance, the WeChat, the Tencent mailbox, and the like of course, the application is not limited to the communication application, and is not specifically limited in the embodiment of the present application; the user equipment, the server, and the communication service device shown in FIG. 5 cooperate with each other to implement the implementation.
  • An optional flowchart of the communication number processing method provided by the example includes:
  • Step 501 When detecting the communication number of the opposite party with the current user, the user equipment (or the application installed on the user equipment) sends an identification indication carrying the communication number of the opposite party to the server.
  • Step 502 The server receives the identification indication, and sends a CDR request carrying the communication number of the opposite party to the communication service device according to the identification indication.
  • the CDR request includes the communication number of the opposite party and the first preset time.
  • Step 503 The communication service device receives the bill request, and obtains the bill of the other party communication number in the first preset time based on the bill, and sends the bill to the server.
  • Step 504 The server receives a bill of the communication number of the other party in the first preset time.
  • step 505 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of the other party's communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 506 Extract the communication start time of the counterpart communication number in the pre-processed bill as the communication initiation number.
  • Step 507 Calculate the number of communications of the counterpart communication number in the unit time in the pre-processed bill.
  • Step 508 Determine whether the communication number of the counterpart communication number included in the pre-processed bill is greater than a second threshold in the unit time, and if yes, go to step 509, otherwise the process terminates.
  • Step 509 Extract, from the communication number of the other party included in the pre-processing CDR, a communication initiation number whose communication time in the unit time is greater than a second threshold, as the target communication number.
  • Step 510 The server performs a dangerous reminder to the user based on the identified target communication number, and reminds the user that the identified target communication number may be a fraudulent number; the implementation of the dangerous reminder includes but is not limited to a short message, a flash message, a WeChat, and a Tencent mobile phone.
  • the communication application such as the housekeeper reminds the server; the server can also perform a dangerous reminder to the user equipment directly through the customer service phone when the target communication number is recognized.
  • the server may also perform a danger reminder to the user having the communication response number of the communication record of the identified target communication number or the user of the communication response number being communicated with the identified target communication number. To avoid users being cheated.
  • the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user equipment displays, for example, the following text reminding information in a display window of an application installed in the user equipment. "Please be vigilant! The target communication number is a fraudulent number"; the user application here includes, but is not limited to, a communication application such as a short message, a flash message, a WeChat, a Tencent mobile phone housekeeper, etc.; of course, the application is not limited to the communication application, and the embodiment of the present application This is not specifically limited.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the number of communication times of each communication initiation number in the pre-processed CDRs in the unit time is calculated (ie, the communication number is used as One of the characteristics of the communication initiation number), extracting, from each communication initiation number included in the pre-processed bill, a communication initiation number whose communication number in a unit time is greater than a preset second threshold as the target communication number, or based on Pre-processing the number of times of communication in the communication initiation number included in the pre-processing CDRs, and extracting the communication-initiated number of the second ratio, which is the highest number of communication times, as the target communication number;
  • the communication initiation number is characterized by the number
  • the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
  • the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 601 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 602 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • Step 603 Extract the communication duration of each communication number in the pre-processed bill as the communication initiation number.
  • Step 604 Calculate an average communication duration of each communication initiation number in the pre-processed bill.
  • the average communication duration of the communication initiation number may include any of the following:
  • Step 605 Determine whether the average communication duration of each communication initiation number included in the pre-processed CDR is greater than a third threshold. If yes, go to step 606, otherwise the process terminates.
  • the initial value of the third threshold can be calculated manually or by training, for example:
  • the average communication duration corresponding to the communication initiation number having the smallest average communication duration in the selected communication initiation number is determined as the initial value of the third threshold.
  • the third threshold can be continuously updated through training calculation according to actual needs.
  • Step 606 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold, as the target communication number.
  • the communication number processing device sorts the average communication duration of each communication initiation number included in the pre-processed bill based on the average communication duration of each communication initiation number in the pre-processed bill; The order of the average communication duration of each of the included communication initiation numbers is extracted, and the communication initiation number of the third ratio having the highest average communication duration is extracted from each communication initiation number included in the pre-processed bill as the target communication number.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the pre-processed CDRs based on the parsing of the dialog, respectively calculating the average communication duration of each communication initiation number in the pre-processed CDR (ie, the communication number is one of the characteristics of the communication initiation number), from the pre-processing Extracting, by each communication initiation number included in the CDR, a communication initiation number whose average communication duration is greater than a preset third threshold as the target communication number, or sorting based on the average communication duration of each communication initiation number included in the pre-processed CDR The communication initiation number of the third ratio with the highest average communication duration is used as the target communication number.
  • the embodiment of the present application is characterized by the average communication duration of each communication initiation number in the pre-processed bill, and the third threshold is a preset feature.
  • the average communication duration of each communication initiation number included in the pre-processed bill is compared with the third threshold Relationship, from the extracted communication number included in the telephone bill pretreatment target communication number matches a preset characteristics to achieve a rapid and accurate identification number.
  • the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
  • the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 701 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 702 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 703 Extract the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number.
  • Step 704 Calculate the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill.
  • Step 705 Determine whether the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is greater than a fourth threshold. If yes, go to step 706, otherwise the process terminates.
  • the initial value of the fourth threshold can be calculated manually or by training, for example:
  • the number of different attributions of the communication response number corresponding to the communication initiation number having the smallest number of different attributions of the communication response number corresponding to the selected communication initiation number is determined as an initial value of the fourth threshold.
  • the fourth threshold can be continuously updated through training calculation according to actual needs.
  • Step 706 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold, as the target communication number.
  • the communication number processing device is based on the average communication duration of each communication initiation number in the pre-processed bill, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed bill. Sorting; sorting the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR, and extracting the difference of the corresponding communication response number from each communication initiation number included in the pre-processed CDR The communication initiation number of the fourth ratio with the highest number of attributions is used as the target communication number.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the embodiment of the present application initiates each communication in the pre-processed bill.
  • the number of different attributions of the communication response number corresponding to the number is characterized, and the fourth threshold is a preset feature, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is determined.
  • the relative relationship of the fourth threshold is obtained by extracting the target communication number matching the preset feature from the communication number included in the pre-processed CDR, thereby realizing fast and accurate number identification.
  • This embodiment is based on the foregoing embodiment, and proposes a solution to solve the scenario of how to extract the target communication number that matches the preset feature from the communication number included in the pre-processed CDR.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 801 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 802 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • Step 803 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 804 Analyze, by using a machine learning model, features of corresponding types of communication information of each communication number in the pre-processed bill.
  • Step 805 Determine whether the feature of the corresponding type communication information of each communication number matches the preset feature. If yes, go to step 806, otherwise the process ends.
  • Step 806 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the implementation of the feature of the corresponding type of communication information of each communication number in the pre-processing bill is analyzed by using the machine learning model, including: using the technical solution or technology described in any one of the foregoing embodiments 3 to 6.
  • the combination of scenarios identifies the target communication number.
  • the machine learning model can adopt any of the following models or combinations: Bayesian classifier model; Support Vector Machine (SVM) classifier model; deep learning model; logic Regression; those skilled in the art can understand that the machine learning model can also include other models not listed herein, and the application is not limited thereto.
  • SVM Support Vector Machine
  • This embodiment is directed to how to obtain a scenario in which a target communication number that matches a preset feature is extracted from a communication number included in a pre-processed CDR, and analyzes a corresponding type of communication information of each communication number in the pre-processed CDR by using a machine learning model.
  • the feature has the target communication number matched with the preset feature extracted from the communication number included in the pre-processed CDR, thereby realizing fast and efficient number identification.
  • This embodiment is based on the seventh embodiment, and proposes a solution to solve the scenario in which the machine learning model is trained based on the feedback information of the target communication number on the user side.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 901 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 902 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 903 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 904 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 905, otherwise The process ends.
  • Step 905 extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill; and identify or identify the user with the communication response number of the communication record with the identified target communication number.
  • the user of the communication response number of the target communication number communication performs a danger reminder.
  • Step 906 Receive feedback information of the user side for the target communication number.
  • Step 907 Determine, according to the feedback information of the target communication number by the user side, whether the target communication number is a security number, and if yes, go to step 908, otherwise the process ends.
  • Step 908 Determine an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side among the identified target communication numbers.
  • Step 909 Determine whether the error rate of the machine learning model is greater than a fifth threshold, and if yes, go to step Step 910, otherwise the process ends.
  • Step 910 Retrain the machine learning model based on the communication record of the security number in the pre-processed bill.
  • a feasible implementation of the machine learning model to retraining includes:
  • the feature of the at least one type of communication information based on the security number updates the threshold used by the machine learning model to identify the target communication number.
  • the error rate of the machine learning model is determined according to the number of target communication numbers in the target communication number that are fed back to the security number by the user side, and When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the preprocessed bill; since the retraining is based on the communication record of the security number in the preprocessed bill Therefore, the machine learning model obtained by retraining has a higher accuracy rate.
  • using the machine learning model obtained by retraining to identify the target communication number can improve the speed and accuracy of the number identification.
  • This embodiment is based on any of the foregoing embodiments, and proposes a solution to the response processing scenario when the target communication number is identified.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 1001 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 1002 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 1003 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 1004 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature. If yes, go to step 1005; otherwise, The process ends.
  • Step 1005 Extract a target that matches the preset feature from the communication number included in the pre-processed bill Communication number.
  • Step 1006 Determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature.
  • the degree of matching between the feature of the corresponding type of communication information of the target communication number and the preset feature can also be understood as the degree of difference between the feature and the preset feature of the corresponding type of communication information of the target communication number;
  • the similarity between the target communication number and the yellow page number is as follows.
  • the similarity between the target communication number and the yellow page number is greater than the first threshold.
  • the matching degree refers to the difference between the similarity between the target communication number and the yellow page number and the first threshold. size.
  • Step 1007 Determine a dangerous level of the target communication number according to the matching degree of the feature of the corresponding type of communication information of the target communication number with the preset feature.
  • the degree of matching is positively related to the level of danger; different levels of risk can correspond to the degree of matching within different data ranges.
  • Step 1008 Respond to the communication behavior of the target communication number based on the risk level of the target communication number.
  • the real-time degree of response processing is positively related to the hazard level; it is assumed that the defined hazard level includes: high risk, low risk; the hazard level here can be used to characterize the probability that the target communication number is a communication number that meets certain conditions, such as danger The level can be used to characterize the probability that the target communication number is a fraudulent number.
  • the manner of responding to the communication behavior of the target communication number may include: performing a danger reminder to the user having the communication response number of the communication record with the target communication number The user is reminded that the target communication number is a fraudulent number; here, the danger reminder includes a voice reminder and/or a text reminder; the voice reminder is, for example, a voice recording or a customer service telephone reminder; and the text reminder is, for example, a text message or a flash message.
  • the communication number processing means performs an after-the-life danger reminder to the user having the communication response number of the communication record of the target communication number, on the user device having the communication response number of the communication record with the target communication number, in the user application
  • the display window displays the following text reminder message "Please be vigilant!
  • the target communication number is a fraudulent number
  • the user applications here include but are not limited to: SMS, Flash, WeChat, Tencent mobile butler, etc.; of course, the application does not It is limited to the communication application, which is not specifically limited in the embodiment of the present application.
  • the manner of responding to the communication behavior of the target communication number may include: communicating with the target communication number
  • the user of the communication response number of the letter performs an immediate danger reminder (including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder), that is, the user is communicating with the target communication number.
  • an immediate danger reminder including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder
  • the user is reminded that the target communication number is a fraudulent number; or, the ongoing communication with the target communication number is directly intercepted, and the user is reminded of danger afterwards.
  • the risk level of the target communication number is determined based on the matching degree of the feature of the corresponding type of communication information of the target communication number and the preset feature, based on the risk level of the target communication number. Respond to the communication behavior of the target communication number, and remind the user who communicates with the target communication number to be vigilant and avoid fraud.
  • the present embodiment is applicable to a scenario in which it is necessary to identify a communication number that satisfies a preset condition from among a plurality of communication numbers, for example, for identification of a whole network number in a communication network, or for a user indication, based on any of the above embodiments.
  • Identifying the identification of the communication number, or in the scene of identifying the communication number for communicating with the current user; the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; data service (such as WeChat), this application is not limited to this.
  • a communication number processing apparatus (a fraudulent number identification system based on bill analysis) provided in this embodiment includes: an online identification system and an offline training system.
  • the online identification system extracts features according to the bill records collected by the operator; uses the machine learning model to determine whether a certain phone number is a fraudulent phone; then, the user is reminded/returned to the user to avoid being deceived, and will be reminded/ The results of the return visit are fed back to the offline training system, and the machine learning model is adjusted accordingly.
  • the offline training system extracts the corresponding features by using the historical bill data and the feedback result of the reminder/return visit in the online identification system; using these features,
  • the machine learning model is retrained and adjusted; the trained machine learning model is synchronized to the fraudulent phone recognition engine in the online training system.
  • the online identification system can identify the fraud number according to the user's call bill record; the online identification system can be further divided into three modules: a bill collection module, a fraudulent phone recognition engine, and a deceived user reminder system;
  • CDR collection module mainly responsible for the collection of user call records, and pre-processing the collected CDRs to obtain the following four columns of information:
  • Fraud phone identification engine This is the core of the online identification system; the collected bills are cleaned, the features are extracted, and the features extracted from the trained machine learning model dialog are used to identify whether the number is a fraudulent phone; It can be divided into three parts: bill cleaning, feature extraction and fraud number identification;
  • Bill cleaning is to remove the "dirty" data in the bill.
  • the so-called "dirty” data is some abnormal data, such as missing content, abnormal values, and so on.
  • Feature extraction After cleaning the CDRs, some features are extracted to prepare for the identification of the next scam number.
  • the features include: the similarity of the calling number, the average call duration, and the distance of the adjacent CDRs. Call interval, etc.
  • the similarity feature between the calling number and the yellow page number (ie, the similarity between the above-mentioned communication initiation number and the yellow page number): the fraud number is mostly the calling number, and the fraudster changes the calling number to the number on the yellow page by changing the numbering software. Similar numbers, such as 001XX86, +0109XX88, 08XXX10010 (China Unicom's customer service phone number is 10010), etc., calculate the edit distance of the substring of these numbers and the number on the yellow page (edit distance indicates the yellow page number, for example, by adding, reducing, modifying, moving The number of operations that the operation becomes the calling number).
  • the number of calls per unit time (that is, the number of communication times of the above communication initiation number in the unit time):
  • the fraudsters usually make a lot of calls every hour, and most of these calls are during working hours, that is, Monday to Friday 08. :00:00--18:00:00, during this time, the number of calls is evenly distributed; during non-working hours, the number of calls made by the phone is generally small, basically 0.
  • the average call duration (that is, the average communication duration mentioned above): that is, the average number of calls per call for the fraudulent number.
  • the average call duration of the fraud is short, no more than 20s.
  • the distribution of the attribution of the called number in time (unit: day) (ie, the number of different attributions of the communication response number corresponding to the above-mentioned communication initiation number): the fraudster is usually fraudulently by city, therefore, The called numbers in these bills usually belong to a certain city, and the number of the cities belonging to the called number within a certain period of time is taken as the feature.
  • the deceived user reminds the system: telling the victim of the fraudulent call to receive a call that is a fraudulent call, preventing the victim from being deceived; and submitting the information of the victim's feedback to the offline training system.
  • the offline training system extracts the characteristics of the relevant historical bills, retrains the machine learning model, and adjusts the Bayesian classifier (here can also Use other machine learning algorithms, such as svm classifier, logistic regression, deep learning, etc.); offline training system can be divided into three parts:
  • Extract historical bills Extract historical bills from the most recent period of time, especially if the feedback result is wrong.
  • Feature extraction Extract features from historical CDRs to provide data for the next model retraining.
  • Model retraining Using the features extracted in b), training the Bayesian classifier to obtain new parameters, and updating the trained machine learning model to the online recognition system.
  • the online identification system and the offline training system form a complete closed loop.
  • the offline training system will decide whether to retrain and update the fraudulent number identification model in the online identification system according to the result of the voice return visit.
  • the communication number processing device provided in this embodiment has the following advantages: 1) no need for the user's tag information, only the bill record is required; 2) speeding up the recognition speed and accuracy of the fraud number; 3) more accurate identification Fraud number; enables the operator to identify fraudulent calls during the user's call.
  • the embodiment further describes a communication number processing device, which can be used to execute the communication number processing method in the embodiment of the present application, and the communication number processing device can be implemented in various manners.
  • a user device such as a smart phone, a landline phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.), or in a network device such as an enterprise gateway or a carrier gateway.
  • the communication number processing device may also be a client application or a background server of the user application, for example, when the user application When the Tencent mobile phone manager is in charge, the corresponding communication number processing device may be a client or a background server of the Tencent mobile phone housekeeper; see FIG. 13, the communication number processing device includes:
  • the obtaining module 1301 is configured to acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time;
  • the pre-processing module 1302 is configured to parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR;
  • the parsing module 1303 is configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
  • the extracting module 1304 is configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
  • the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
  • the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
  • the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
  • the acquisition speed and efficiency of the communication number CDR are high.
  • the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
  • the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
  • the pre-processing module 1302 is specifically configured to:
  • the extracted communication records of the respective communication initiation numbers are combined to form a pre-processed bill.
  • the parsing module 1303 is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the pre-processed bill; and obtain each communication initiation number and yellow page in the pre-processed bill based on the edit distance. Number similarity;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on each communication initiation number included in the pre-processed CDR The ordering of the similarity between the middle and the yellow page number extracts the first proportion of the communication initiation number with the highest similarity.
  • the parsing module 1303 is specifically configured to: extract each of the pre-processed bills The communication number is used as the communication start time of the communication initiation number; and the number of communication times of each communication initiation number in the pre-processed bill in the unit time is calculated;
  • the extraction module 1304 is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on each communication initiation number included in the pre-processed CDR The order of the number of communication times per unit time is extracted, and the communication ratio number of the second ratio with the highest number of communication times is extracted.
  • the parsing module 1303 is specifically configured to: extract the communication duration of each communication number in the pre-processed bill as the communication initiation number; and calculate the average communication duration of each communication initiation number in the pre-processed bill;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on the average communication of each communication initiation number included in the pre-processed CDR The sorting of the duration, extracting the third ratio of the communication initiation number with the highest average communication duration.
  • the parsing module 1303 is specifically configured to: obtain the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number; calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the pre-processed CDR The order of the number of different attributions of the communication response number corresponding to each communication initiation number is extracted, and the communication generation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
  • the extraction module 1304 is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill.
  • the target communication number that matches the preset feature is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill. The target communication number that matches the preset feature.
  • the communication number processing apparatus of this embodiment also includes the obtaining module 1301, the preprocessing module 1302, the parsing module 1303, and the extracting module 1304 in FIG.
  • the communication number processing device of the embodiment further includes:
  • the training module 1305 is configured to receive feedback information of the user side for the target communication number, and determine the target. Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side in the identified target communication number; when the error rate of the machine learning model is greater than the fifth threshold, based on The communication record of the security number in the CDR is preprocessed, and the machine learning model is retrained.
  • the training module 1305 is specifically configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain a feature of the at least one type of communication information of the security number; The feature possessed by the at least one type of communication information updates the threshold used by the machine learning model to identify the target communication number.
  • the device further includes:
  • the response module 1306 is configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature, and determine a target according to a matching degree of the feature of the corresponding type of the communication number of the target communication number and the preset feature.
  • the danger level of the communication number responding to the communication behavior of the target communication number based on the danger level of the target communication number.
  • the obtaining module 1301, the pre-processing module 1302, the parsing module 1303, the extracting module 1304, the training module 1305, and the response module 1306 may all be configured by a central processing unit (CPU) and a microprocessor (MPU) located in the communication number processing device. ), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • CPU central processing unit
  • MPU microprocessor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • This embodiment describes a computer readable medium, which may be a ROM (eg, a read only memory, a FLASH memory, a transfer device, etc.), a magnetic storage medium (eg, a magnetic tape, a disk drive, etc.), an optical storage medium (eg, a CD- ROM, DVD-ROM, paper card, paper tape, etc.) and other well-known types of program memory; computer-readable medium storing computer-executable instructions (such as binary executable instructions for projection applications such as Tencent video), when executing instructions Causing at least one processor to perform the following operations:
  • the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
  • the communication number processing device parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and identifies and presets from each communication number based on the characteristics of the corresponding type of communication information of the communication number.
  • the target communication number of the feature matching is generally responsible for the generation and maintenance process of the communication number CDR, and does not require the participation of each user, and the acquisition speed and efficiency of the communication number CDR are high, on the other hand Because the CDR of the communication number is the objective data maintained by the operator, it can truly and completely reflect all the communication records of the user in a certain time interval. Therefore, the technical solution provided by the embodiment of the present application is processed by the CDR of the communication number. Basic, can improve the speed and accuracy of number identification.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the application can take the form of a hardware embodiment, a software embodiment or an embodiment in combination with software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请公开一种通信号码处理方法及装置,方法包括:从通信业务设备获取第一预设时间内预设数量的通信号码的话单;解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。采用本申请,能够提高号码识别的速度和准确性。

Description

通信号码处理方法及装置
本申请要求于2016年4月25日提交中国专利局、申请号为201610261923.1、发明名称为“通信号码处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域的数据处理技术,尤其涉及一种通信号码处理方法及装置。
背景技术
电信诈骗是指犯罪分子通过电话、网络和短信等方式,编造虚假信息,设置骗局,对受害人实施远程、非接触式诈骗,诱使受害人给犯罪分子打款或转账的犯罪行为,随着移动互联网的兴起,电信诈骗犯罪日益猖獗,数据显示,电信诈骗的涉案金额每年以指数级的速度快速增长,2015年全国公安机关共立电信诈骗案件59万起,同比上升32.5%,共造成经济损失222亿元;而每一个案件背后,都可能是一个个因诈骗而破碎的家庭。
为了遏制电信诈骗,避免用户被诈骗电话诈骗,现有技术通过手机上的应用软件(app),收集用户对号码的标记信息,如果发现某个号码被多个用户同时标记为诈骗号码,则认为该号码为诈骗号码,并提醒与该诈骗号码进行通话的用户提高警惕,以避免被诈骗。
然而,一方面,现有技术需要收集用户标记信息,然而,实际中用户对号码进行标记的概率比较低,很多用户接到一个陌生来电往往不会去标记号码的类型,并且,现有技术需要收集足够多的用户标记后,才能认为该号码是诈骗号码,因此,现有技术的诈骗号码的识别速度慢、效率低;另一方面,现有技术中用户对号码进行标记是主观行为,很多用户在接听到一些骚扰电话,比如广告推销等恶意电话时,往往会将这些骚扰号码也标记为诈骗号码,因此,现有技术的诈骗号码的识别准确率较低。
发明内容
有鉴于此,本申请实施例期望提供一种通信号码处理方法及装置,能够提 高号码识别的速度和准确性。
为达到上述目的,本申请的技术方案是这样实现的:
第一方面,本申请实施例提供一种通信号码处理方法,所述方法包括:
从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
可选的,所述解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单,包括:
解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;
提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;
将所提取的各通信发起号码的通信记录组合形成所述预处理话单。
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离,其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数;
基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;
或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相 似度的排序,提取出相似度最高的第一比例的通信发起号码。
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;
计算所述预处理话单中各通信发起号码在单位时间内的通信次数;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;
或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
提取所述预处理话单中各通信号码作为通信发起号码的通信时长;
计算所述预处理话单中各通信发起号码的平均通信时长;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;
或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;
计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号 码,包括:
从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;
或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。
可选的,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
可选的,所述方法还包括:
接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;
基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;
机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。
可选的,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型至进行重新训练,包括:
解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;
基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。
可选的,所述从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码之后,所述方法还包括:
确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;
根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;
基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响 应处理。
可选的,确定所述目标通信号码的危险级别为低危时,对所述目标通信号码的通信行为进行响应处理,包括:向具有与目标通信号码的通信记录的通信响应号码的用户进行危险提醒;其中,所述危险提醒包括语音提醒和/或文字提醒;
或者,确定所述目标通信号码的危险级别为高危时,对所述目标通信号码的通信行为进行响应处理,包括:向与目标通信号码正在进行通信的通信响应号码的用户进行即时的危险提醒;或者,直接拦截与目标通信号码正在进行的通信。
可选的,响应处理的实时程度与危险级别正相关。
第二方面,本申请实施例提供一种通信号码处理装置,所述装置包括:
获取模块,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
预处理模块,用于解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
解析模块,用于解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;
提取模块,用于从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
可选的,所述预处理模块,具体用于:
解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;
提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;
将所提取的各通信发起号码的通信记录组合形成所述预处理话单。
可选的,所述解析模块,具体用于:分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度,其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码。
可选的,所述解析模块,具体用于:提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;计算所述预处理话单中各通信发起号码在单位时间内的通信次数;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。
可选的,所述解析模块,具体用于:提取所述预处理话单中各通信号码作为通信发起号码的通信时长;计算所述预处理话单中各通信发起号码的平均通信时长;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。
可选的,所述解析模块,具体用于:获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。
可选的,所述提取模块,具体用于:使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
可选的,所述装置还包括:
训练模块,用于接收用户侧针对目标通信号码的反馈信息,确定所述目标 通信号码是否为安全号码;基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。
可选的,所述训练模块,具体用于:解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。
可选的,所述装置还包括:
响应模块,用于确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。
相比于现有技术需要收集用户的标记信息,本申请实施例通过解析第一预设时间内预设数量的通信号码的话单得到各通信号码的相应类型通信信息所具有的特征,并基于各通信号码的相应类型通信信息所具有的特征从各通信号码中提取出与预设特征匹配的目标通信号码,一方面,通信号码话单是由运营商维护的客观数据,能够真实和完整地反映用户在一定时间间隔内的全部通信记录,本申请实施例以通信号码话单为处理依据,能够提高号码识别的准确性,另一方面,由于话单的生成及维护过程一般并不需要各用户的直接参与,而是由运营商负责,因而通信号码话单的获取速度和效率较高,如此,本申请实施例能够提高号码识别的速度和准确性。
附图说明
图1为本申请实施例中通信号码处理方法的一个可选的应用场景示意图;
图2为本申请实施例一中通信号码处理方法的一个可选的流程示意图;
图3为本申请实施例二中通信号码处理方法的一个可选的流程示意图;
图4为本申请实施例三中通信号码处理方法的一个可选的流程示意图;
图5为本申请实施例四中通信号码处理方法的一个可选的流程示意图;
图6为本申请实施例五中通信号码处理方法的一个可选的流程示意图;
图7为本申请实施例六中通信号码处理方法的一个可选的流程示意图;
图8为本申请实施例七中通信号码处理方法的一个可选的流程示意图;
图9为本申请实施例八中通信号码处理方法的一个可选的流程示意图;
图10为本申请实施例九中通信号码处理方法的一个可选的流程示意图;
图11a为本申请实施例中运行于用户设备上的用户应用处于接收用户指示状态的一个可选的示意图;
图11b为本申请实施例中运行于用户设备上的用户应用处于文字提醒状态的一个可选的示意图;
图12为本申请实施例中通信号码处理装置的一个可选的结构示意图;
图13为本申请实施例中通信号码处理装置的另一个可选的结构示意图;
图14为本申请实施例中通信号码处理装置的又一个可选的结构示意图。
具体实施方式
以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例记载一种通信号码处理方法,参见图1示出的本申请实施例中通信号码处理方法的一个可选的应用场景,用户设备11、用户设备12、用户设备13、网络设备14(如运营商网关或企业网关)、通信业务设备15、应用的后台服务器16分别接入通信网络(如无线网络或有线网络),通信业务设备15例如业务支撑***(BSS,Business Support System)/运营支撑***(OSS,Operation Support System),或者电信交换机;通信业务设备15用于提供通信号码的话单;网络设备14用于对接入该通信网络的各用户设备提供业务支撑;应用的后台服务器16用于为应用提供业务支撑;这里,与应用的后台服务器16相对应的,安装于用户设备的应用的客户端也用于为应用提供业务支撑;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;在上述场景中,用户设备的数量至少为一个,各用户设备分别关联至少一个不同的通信号码,例如,图1示出的用户设备11关联至少一个通信号码A、用户设备12关联至少一个通信号码B,用户设备13关联至少一个通信号码C,通信号码A、通信号码B与通信号码C两两互不相同;本申请实施例中通信号码处理方法可以应用于上述场景中,实现从多个通信号码中识别出满足预设条件的通信号码。
本申请实施例还记载一种通信号码处理装置,可以用于执行本申请实施例 的通信号码处理方法;通信号码处理装置可以采用各种方式来实施,例如在智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等用户设备中实施装置的全部组件,或者,在企业网关、运营商网关等网络设备中实施装置的全部组件,或者,在上述的用户设备侧或网络侧以耦合的方式实施装置中的组件,或者,通信号码处理装置还可以是用户应用的客户端或者后台服务器,例如,当用户应用为腾讯手机管家时,相应的通信号码处理装置可以为腾讯手机管家的客户端或者后台服务器。
基于上述记载的应用场景及通信号码处理装置,提出以下各具体实施例。
实施例一
本实施例提供一种通信号码处理方法,可以应用于需要从多个通信号码中识别出满足预设条件的通信号码的场景中,例如针对通信网络中全网号码的识别,或者,针对用户指示的待识别通信号码的识别,或者,针对与当前用户进行通信的通信号码的识别等场景中;通信的业务类型包括但不限于以下任意一种业务类型或组合:语音通话;短信;闪信;数据业务(如微信),本申请并不以此为限。
基于上述通信号码处理装置,参见图2,本实施例提供的通信号码处理方法,包括以下步骤:
步骤201、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
通信业务设备可以包括电信支撑***设备,例如BSS/OSS,或者电信交换机;第一预设时间可以由用户或运营商根据实际业务需求等实际情况灵活设定;通信号码并不限于手机号码、固定号码等;通信号码例如可以包括通信网络中的全部通信号码,或者,用户指示的待识别通信号码,或者,与当前用户进行通话的通信号码;其中,上述用户指示的待识别通信号码,例如用户在用户设备上运行的应用(如腾讯手机管家)中指定的待识别通信号码,或者,用户向运营商服务器发送携带待识别通信号码的指示消息。
上述从通信业务设备获取第一预设时间内预设数量的通信号码的话单的实现方式可以为以下方式至少之一:
1)从通信业务设备获取通信网络中的全部通信号码在第一预设时间内的话单;
2)根据当前用户指示的待识别通信号码,从通信业务设备获取待识别通信号码在第一预设时间内的话单;
3)检测到与当前用户进行通话的通信号码时,从通信业务设备获取与当前用户进行通话的通信号码在第一预设时间内的话单;
4)确定与当前用户进行通话的通信号码为陌生通信号码时,从通信业务设备获取陌生通信号码在第一预设时间内的话单。
步骤202、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
上述从通信业务设备获取的第一预设时间内预设数量的通信号码的话单一般是乱序的,本实施例中预处理话单是以各通信号码为维度进行统计形成,预处理话单中包括各通信号码在以下情况至少之一对应的至少一种类型的通信信息:通信号码作为主叫号码(如语音业务中的主叫号码)、通信号码作为被叫号码(如语音业务中的被叫号码)、通信号码作为信息发送号码(如短信发送号码,或者数据业务中的数据发送号码)、通信号码作为信息接收号码(如短信接收号码,或者数据业务中的数据接收号码)。
预处理话单中仅包括从话单中提取的各通信号码的至少一种类型的通信信息,也即预处理话单中并不需要包括话单中的全部信息;预处理话单的数据以各通信号码作为索引,预处理话单的数据结构,例如为:
语音业务中的主叫号码1:通信信息1、通信信息2、…;
语音业务中的主叫号码2:通信信息3、通信信息4、…;
短信发送号码3:通信信息5、通信信息6、…;
数据业务中的数据发送号码4:通信信息7、通信信息8、…。
以表1示出的以各通信号码作为主叫号码进行索引的预处理话单为例,参见表1的数据结构示例,此处的主叫号码、被叫号码、通信起始时间、通信时长(秒)为该话单中所包括的通信信息的类型的部分示例。
表1
主叫号码 被叫号码 通信起始时间 通信时长(秒)
158xxxx0001 186xxxx0002 2016-01-15 15:32:42 134
158xxxx0001 139xxxx0001 2016-01-15 15:39:02 15
158xxxx0001 139xxxx0002 2016-01-15 15:48:02 123
170xxxx0001 186xxxx0001 2016-01-16 8:30:02 77
170xxxx0001 139xxxx0002 2016-01-17 9:26:02 256
步骤203、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤204、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤205,否则流程结束。
步骤205、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
对解析得到的预处理话单中各通信号码的相应类型通信信息所具有的特征进行分析,从预处理话单包括的通信号码中提取出相应类型通信信息所具有的特征与预设特征匹配的目标通信号码;预设特征例如是预先设置的先验值。
相比于需要在收集用户标记信息的基础上实施识别号码的现有技术,本实施例对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。
实施例二
本实施例基于实施例一,针对具体如何解析话单得到话单中所包括的通信信息的类型,及提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单的场景,提出解决的技术方案。
参见图3,本实施例提供的通信号码处理方法,包括以下步骤:
步骤301、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤302、解析话单得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应通信发起号码的通信响应号码、通信起始时间和通信 时长。
通信发起号码可以包括作为主叫号码的通信号码(如语音业务中的主叫号码),及作为信息发送号码的通信号码(如短信发送号码,或者数据业务中的数据发送号码);对应通信发起号码的通信响应号码可以包括作为被叫号码的通信号码(如语音业务中的被叫号码),及作为信息接收号码的通信号码(如短信接收号码,或者数据业务中的数据接收号码);本领域技术人员可以理解的是,话单中包括的通信信息的类型并不限于上述的通信发起号码、对应通信发起号码的通信响应号码、通信起始时间、通信时长等,通信信息的类型还可以包括数据流量(上行流量和/或下行流量)、通信地点、业务类型、长途类型等;本申请并不以此为限。
步骤303、提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录。
步骤304、将所提取的各通信发起号码的通信记录组合形成预处理话单。
这里,预处理话单只是包括了从话单中提取的各通信号码的至少一种类型的通信信息,预处理话单并未包括话单中的全部信息,可以降低通信号码处理工作量,提高通信号码处理效率。
上述从通信业务设备获取的第一预设时间内预设数量的通信号码的话单一般是乱序的,以表2示出的话单为例,此处的通信起始时间、业务类型、通信发起号码、通信响应号码、通信地点、长途类型、通信时长(秒)为该话单中所包括的通信信息的类型的部分示例。
表2
Figure PCTCN2017081813-appb-000001
Figure PCTCN2017081813-appb-000002
通信号码处理装置对表2示出的话单进行解析,得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码;对应通信发起号码的通信响应号码;通信起始时间;通信时长;
通信号码处理装置提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;这里,每个通信发起号码的通信记录中包括该通信号码在第一预设时间内的至少一种类型的通信信息;
将所提取的各通信发起号码的通信记录组合形成预处理话单;预处理话单是以各通信号码为维度进行统计形成,预处理话单中的数据结构(或显示方式)以各通信号码为索引组织,假设将各通信号码为通信发起号码时对应的至少一种类型的通信信息进行组合形成预处理话单,预处理话单的数据结构可以为:
通信发起号码1:通信信息1、通信信息2、…;
通信发起号码2:通信信息1、通信信息2、…;…。
以表3示出的预处理话单为例,表3示出的预处理话单是通信号码处理装置在表2示出的话单的基础上,通过执行步骤202-步骤204的方法得到的;该预处理话单以各通信发起号码为索引进行组织。
表3
通信发起号码 通信响应号码 通信起始时间 通信时长(秒)
158xxxx0001 186xxxx0002 2016-01-15 15:32:42 134
158xxxx0001 186xxxx0007 2016-01-15 15:42:02 97
158xxxx0001 139xxxx0006 2016-01-15 15:48:02 123
158xxxx0001 187xxxx0002 2016-01-15 15:52:07 256
170xxxx0001 186xxxx0001 2016-01-15 15:39:02 15
170xxxx0001 180xxxx0007 2016-01-15 15:51:02 77
170xxxx0001 139xxxx0002 2016-01-16 10:26:02 --
步骤305、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤306、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤307,否则流程结束。
步骤307、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
本实施例针对具体如何解析话单得到话单中所包括的通信信息的类型,及提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单的场景,通过解析话单得到话单中所包括的至少一种类型的通信信息,提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录,将所提取的各通信发起号码的通信记录组合形成预处理话单,所形成的预处理话单仅包括了从话单中提取的各通信号码的至少一种类型的通信信息,预处理话单并未包括话单中的全部信息,可以降低号码识别的工作量,提高号码识别的速度和效率。
实施例三
本实施例基于实施例一,以通信发起号码与黄页号码的编辑距离作为通信号码的特征,说明具体如何从多个通信号码中识别出满足预设条件的通信号码的技术方案;本实施例提供的通信号码处理方法,包括以下步骤:
1)从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
2)解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
3)分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离。
黄页号码可以为一个或多个;编辑距离是指将黄页号码转成与通信发起号码所需的最少编辑操作次数,也即通过对黄页号码进行增加、减少、修改、移动号码等操作变成通信发起号码的操作次数;在黄页号码为多个的场景中,针对预处理话单中的每一个通信发起号码,需要分别计算该通信发起号码与每一个黄页号码的编辑距离。
4)基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度。
可以采用以下方式至少之一,实现基于编辑距离得到预处理话单中各通信 发起号码与黄页号码的相似度:
方式1、针对预处理话单中的每一个通信发起号码,将分别计算得到的该通信发起号码与每一个黄页号码的编辑距离进行归一化处理,得到该通信发起号码与每一个黄页号码的相似度;进一步,对该通信发起号码与每一个黄页号码的相似度进行排序。
方式2、针对预处理话单中的每一个通信发起号码,计算该通信发起号码与黄页号码的编辑距离与预设距离的比值,将计算得到的比值通信发起号码与黄页号码的相似度;在黄页号码为多个的场景中,需要分别计算该通信发起号码与每一个黄页号码的编辑距离与预设距离的比值。
5)判断预处理话单包括的各通信发起号码与黄页号码的相似度是否大于第一阈值,若是,则从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码,作为目标通信号码;否则流程终止。
第一阈值(即相似度阈值)的初始值可以由人工设定或训练计算得到,例如:根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;将各通信发起号码与黄页号码的相似度进行排序;按照相似度递减的次序,选取目标数量的通信发起号码;将所选取的通信发起号码中与黄页号码的相似度最小的通信发起号码所对应的相似度,确定为第一阈值的初始值。第一阈值可以根据实际需要,通过训练计算进行继续更新。
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码与黄页号码的相似度,对预处理话单包括的各通信发起号码与黄页号码的相似度进行排序;基于预处理话单包括的各通信发起号码与黄页号码的相似度的排序,从预处理话单包括的各通信发起号码中提取出相似度最高的第一比例的通信发起号码,作为目标通信号码。
在另一个可行的实施方式中,针对预处理话单包括的各通信发起号码中的任意一个通信号码,通信号码处理装置根据该通信发起号码中与黄页号码的相似度及第一阈值,分别确定该通信发起号码属于目标通信号码(比如诈骗号码)类的概率、及属于正常号码类的概率,将概率较大值所对应的类作为该通信发起号码所属的类;若概率较大值所对应的类为目标通信号码类,则确定该通信发起号码为目标通信号码,反之则确定该通信发起号码为正常号码。
本实施例的实施依赖于用户设备、服务器及通信业务设备的配合,这里,用户设备例如可以是智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设 备(如智能眼镜、智能手表等)等;服务器例如可以是运营商的业务服务器、企业网关、安装于用户设备的应用的后台服务器等;通信业务设备例如可以是BSS/OSS或者电信交换机;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;参见图4示出的用户设备、服务器及通信业务设备相互配合以实施本实施例提供的通信号码处理方法的一个可选的流程图,方法包括:
步骤401、基于用户指示,用户设备向服务器发送携带待识别通信号码的识别指示。
例如,参见图11a,运行于用户设备上的用户应用处于接收用户指示状态,用户在安装于用户设备的应用的显示窗口,按照应用的提示在指定位置输入待识别通信号码;这里,待识别通信号码可以为一个或多个。
步骤402、服务器接收识别指示,基于识别指示向通信业务设备发送携带待识别通信号码的话单请求;话单请求中包括待识别通信号码、及第一预设时间。
步骤403、通信业务设备接收话单请求,基于话单请求获取待识别通信号码在第一预设时间内的话单,并发送给服务器。
步骤404、服务器接收待识别通信号码在第一预设时间内的话单。
步骤405、解析话单得到话单中所包括的通信信息的类型,提取出话单中各待识别通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤406、分别计算预处理话单中的各待识别通信号码与黄页号码的编辑距离。
步骤407、基于编辑距离得到预处理话单中各待识别通信号码与黄页号码的相似度。
步骤408、判断预处理话单包括的各待识别通信号码与黄页号码的相似度是否大于第一阈值,若是,则转到步骤409,否则流程终止。
步骤409、从预处理话单包括的各待识别通信号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码,作为目标通信号码。
步骤410、服务器基于识别到的目标通信号码向用户设备发送携带目标通信号码的识别响应,识别响应用于对用户进行危险提醒,提醒用户该识别到的目标通信号码可能为诈骗号码;危险提醒的实现方式包括但不限于通过短信、闪信、微信、腾讯手机管家等通信类应用进行提醒;服务器还可以在识别到目标通信号码时,直接通过客服电话向用户设备进行危险提醒。
同时,服务器基于识别到的目标通信号码,还可以向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者正在与识别出的目标通信号码通信的通信响应号码的用户进行危险提醒,以避免用户受骗。
用户设备接收到服务器发送的携带目标通信号码的识别响应后,基于目标通信号码对用户进行危险提醒;例如,参见图11b,运行于用户设备上的用户应用处于文字提醒状态,用户设备在安装于用户设备的应用的显示窗口显示例如以下文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离,基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于预设第一阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码中与黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码与黄页号码的相似度为特征,以第一阈值为预设特征,通过判断预处理话单包括的各通信发起号码与黄页号码的相似度与第一阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。
实施例四
本实施例基于实施例一,以通信发起号码在单位时间内的通信次数作为通信号码的特征,说明具体如何从多个通信号码中识别出满足预设条件的通信号码的技术方案;本实施例提供的通信号码处理方法,包括以下步骤:
1)从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
2)解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
3)提取预处理话单中各通信号码作为通信发起号码的通信起始时间。
4)计算预处理话单中各通信发起号码在单位时间内的通信次数。
实际中,通信发起号码在单位时间内的通信次数可以包括以下任意一种:
方式1、通信发起号码与相同号码在单位时间内的通信次数;
方式2、通信发起号码与所有与其进行通信的通信号码在单位时间内的通信次数。
5)判断预处理话单包括的各通信发起号码在单位时间内的通信次数是否大于第二阈值,若是,则从预处理话单包括的各通信发起号码中提取出在单位时间内的通信次数大于第二阈值的通信发起号码,作为目标通信号码;否则流程终止。
第二阈值的初始值可以由人工设定或训练计算得到,例如:根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;将各通信发起号码在单位时间内的通信次数进行排序;按照在单位时间内的通信次数递减的次序,选取目标数量的通信发起号码;将所选取的通信发起号码中在单位时间内的通信次数最小的通信发起号码所对应的在单位时间内的通信次数,确定为第二阈值的初始值。第二阈值可以根据实际需要,通过训练计算进行继续更新。
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码在单位时间内的通信次数,对预处理话单包括的各通信发起号码在单位时间内的通信次数进行排序;基于预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,从预处理话单包括的各通信发起号码中提取出通信次数最高的第二比例的通信发起号码,作为目标通信号码。
本实施例的实施依赖于用户设备、服务器及通信业务设备的配合,这里,用户设备例如可以是智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等;服务器例如可以是运营商的业务服务器、企业网关、安装于用户设备的应用的后台服务器等;通信业务设备例如可以是BSS/OSS或者电信交换机;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;参见图5示出的用户设备、服务器及通信业务设备相互配合以实施本实施例提供的通信号码处理方法的一个可选的流程图,方法包括:
步骤501、当检测到与当前用户进行通话的对方通信号码时,用户设备(或安装于用户设备的应用)向服务器发送携带对方通信号码的识别指示。
步骤502、服务器接收识别指示,基于识别指示向通信业务设备发送携带对方通信号码的话单请求;话单请求中包括对方通信号码及第一预设时间。
步骤503、通信业务设备接收话单请求,基于话单请求获取对方通信号码在第一预设时间内的话单,并发送给服务器。
步骤504、服务器接收对方通信号码在第一预设时间内的话单。
步骤505、解析话单得到话单中所包括的通信信息的类型,提取出话单中对方通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤506、提取预处理话单中该对方通信号码作为通信发起号码的通信起始时间。
步骤507、计算预处理话单中该对方通信号码在单位时间内的通信次数。
步骤508、判断预处理话单包括的该对方通信号码在单位时间内的通信次数是否大于第二阈值,若是,则转到步骤509,否则流程终止。
步骤509、从预处理话单包括的该对方通信号码中提取出在单位时间内的通信次数大于第二阈值的通信发起号码,作为目标通信号码。
步骤510、服务器基于识别到的目标通信号码对用户进行危险提醒,提醒用户该识别到的目标通信号码可能为诈骗号码;危险提醒的实现方式包括但不限于通过短信、闪信、微信、腾讯手机管家等通信类应用进行提醒;服务器还可以在识别到目标通信号码时,直接通过客服电话向用户设备进行危险提醒。
同时,服务器基于识别到的目标通信号码,还可以向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者与识别出的目标通信号码正在通信的通信响应号码的用户进行危险提醒,以避免用户受骗。
用户设备接收到服务器发送的携带目标通信号码的识别响应后,基于目标通信号码对用户进行危险提醒;例如,参见图11b,用户设备在安装于用户设备的应用的显示窗口显示例如以下文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码在单位时间内的通信次数(即通信号码作为 通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出在单位时间内的通信次数大于预设第二阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码中在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码在单位时间内的通信次数为特征,以第二阈值为预设特征,通过判断预处理话单包括的各通信发起号码在单位时间内的通信次数与第二阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。
实施例五
本实施例基于实施例一,针对具体如何解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。
参见图6,本实施例提供的通信号码处理方法,包括以下步骤:
步骤601、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤602、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤603、提取预处理话单中各通信号码作为通信发起号码的通信时长。
步骤604、计算预处理话单中各通信发起号码的平均通信时长。
实际中,通信发起号码的平均通信时长可以包括以下任意一种:
1)通信发起号码与相同号码的平均通信时长;
2)通信发起号码与所有与其进行通信的通信号码的平均通信时长。
步骤605、判断预处理话单包括的各通信发起号码的平均通信时长是否大于第三阈值,若是,则转到步骤606,否则流程终止。
第三阈值的初始值可以由人工设定或训练计算得到,例如:
根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;
将各通信发起号码的平均通信时长进行排序;
按照平均通信时长递减的次序,选取目标数量的通信发起号码;
将所选取的通信发起号码中平均通信时长最小的通信发起号码所对应的平均通信时长,确定为第三阈值的初始值。
第三阈值可以根据实际需要,通过训练计算进行继续更新。
步骤606、从预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码,作为目标通信号码。
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码的平均通信时长,对预处理话单包括的各通信发起号码的平均通信时长进行排序;基于预处理话单包括的各通信发起号码的平均通信时长的排序,从预处理话单包括的各通信发起号码中提取出平均通信时长最高的第三比例的通信发起号码,作为目标通信号码。
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码的平均通信时长(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出平均通信时长大于预设第三阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码的平均通信时长为特征,以第三阈值为预设特征,通过判断预处理话单包括的各通信发起号码的平均通信时长与第三阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。
实施例六
本实施例基于实施例一,针对具体如何解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。
参见图7,本实施例提供的通信号码处理方法,包括以下步骤:
步骤701、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤702、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤703、提取预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地。
步骤704、计算预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量。
步骤705、判断预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量是否大于第四阈值,若是,则转到步骤706,否则流程终止。
第四阈值的初始值可以由人工设定或训练计算得到,例如:
根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;
将各通信发起号码的平均通信时长进行排序;
按照所对应的通信响应号码的不同归属地的数量递减的次序,选取目标数量的通信发起号码;
将所选取的通信发起号码中所对应的通信响应号码的不同归属地的数量最小的通信发起号码所对应的通信响应号码的不同归属地的数量,确定为第四阈值的初始值。
第四阈值可以根据实际需要,通过训练计算进行继续更新。
步骤706、从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码,作为目标通信号码。
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码的平均通信时长,对预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量进行排序;基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码,作为目标通信号码。
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码所对应的通信响应号码的不同归属地的数量(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通 信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于预设第三阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量为特征,以第四阈值为预设特征,通过判断预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量与第四阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。
实施例七
本实施例基于上述实施例,针对具体如何从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。
参见图8,本实施例提供的通信号码处理方法,包括以下步骤:
步骤801、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤802、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤803、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤804、使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤805、判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤806,否则流程结束。
步骤806、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
这里,使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征的实现方式包括:使用上述实施例三至实施例六中任意一个实施例所记载的技术方案或者技术方案的组合识别目标通信号码。
机器学习模型可以采用以下任意一种模型或组合:贝叶斯分类器模型;支持向量机(SVM,Support Vector Machine)分类器模型;深度学习模型;逻辑 回归;本领域技术人员可以理解的是,机器学习模型还可以包括此处未列举的其他模型,本申请并不以此为限。
本实施例针对具体如何得到从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速、高效的号码识别。
实施例八
本实施例基于实施例七,针对具体如何基于用户侧针对目标通信号码的反馈信息对机器学习模型进行训练的场景,提出解决的技术方案。
参见图9,本实施例提供的通信号码处理方法,包括以下步骤:
步骤901、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤902、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤903、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤904、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤905,否则流程结束。
步骤905、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码;以及,向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者正在与识别出的目标通信号码通信的通信响应号码的用户进行危险提醒。
步骤906、接收用户侧针对目标通信号码的反馈信息。
接收用户侧针对携带识别出的目标通信号码的危险提醒的反馈信息。
步骤907、根据用户侧针对目标通信号码的反馈信息,判断目标通信号码是否为安全号码,若是,则转到步骤908,否则流程结束。
步骤908、基于识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定机器学习模型的错误率。
步骤909、判断机器学习模型的错误率是否大于第五阈值,若是,则转到步 骤910,否则流程结束。
步骤910、基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练。
这里,基于预处理话单中安全号码的通信记录,对机器学习模型至进行重新训练的一种可行的实现方式包括:
解析预处理话单中安全号码的通信记录的至少一种类型的通信信息,得到安全号码的至少一种类型的通信信息所具有的特征;
基于安全号码的至少一种类型的通信信息所具有的特征更新机器学习模型识别目标通信号码所使用的阈值。
本实施例针对基于用户侧针对目标通信号码的反馈信息对机器学习模型进行训练的场景,根据目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量确定机器学习模型的错误率,并在机器学习模型的错误率大于第五阈值时,基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练;由于重新训练时依据的是预处理话单中安全号码的通信记录,因而重新训练得到的机器学习模型的准确率较高,如此,使用重新训练得到的机器学习模型进行目标通信号码的识别,能够提高号码识别的速度和准确性。
实施例九
本实施例基于上述任意实施例,针对识别到目标通信号码时的响应处理场景,提出解决的技术方案。
参见图10,本实施例提供的通信号码处理方法,包括以下步骤:
步骤1001、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。
步骤1002、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。
步骤1003、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。
步骤1004、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤1005,否则流程结束。
步骤1005、从预处理话单包括的通信号码中提取出与预设特征匹配的目标 通信号码。
步骤1006、确定目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度。
目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度也可以理解为是目标通信号码的相应类型通信信息所具有的特征与预设特征的差异程度;以目标通信号码的特征为目标通信号码与黄页号码的相似度为例,目标通信号码与黄页号码的相似度大于第一阈值,这里,匹配程度是指目标通信号码与黄页号码的相似度与第一阈值的差值的大小。
步骤1007、根据目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定目标通信号码的危险级别。
匹配程度与危险级别是正相关的关系;不同的危险级别可以对应不同数据范围内的匹配程度。
步骤1008、基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理。
响应处理的实时程度与危险级别是正相关的关系;假设定义的危险级别包括:高危、低危;此处的危险级别可以用于表征该目标通信号码是满足特定条件的通信号码的概率,例如危险级别可以用于表征该目标通信号码是诈骗号码的概率。
通信号码处理装置在确定目标通信号码的危险级别为低危时,对目标通信号码的通信行为进行响应处理的方式可以包括:向具有与目标通信号码的通信记录的通信响应号码的用户进行危险提醒,提醒该用户该目标通信号码是诈骗号码;这里,危险提醒包括语音提醒和/或文字提醒;语音提醒例如发送语音录音或客服电话提醒;文字提醒例如为短信或闪信。
参见图11b,通信号码处理装置向具有与目标通信号码的通信记录的通信响应号码的用户进行事后的危险提醒,在具有与目标通信号码的通信记录的通信响应号码的用户设备上,在用户应用的显示窗口显示如下的文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。
通信号码处理装置在确定目标通信号码的危险级别为高危时,对目标通信号码的通信行为进行响应处理的方式可以包括:向与目标通信号码正在进行通 信的通信响应号码的用户进行即时的危险提醒(包括但不限于短信或闪信等文字提醒方式,或发送语音录音或客服电话提醒等语音提醒方式),即在该用户正在与目标通信号码进行通信的过程中提醒该用户该目标通信号码是诈骗号码;或者,直接拦截与目标通信号码正在进行的通信,且事后对用户进行危险提醒。
本实施例针对识别到目标通信号码时的响应处理场景,基于目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度确定目标通信号码的危险级别,基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理,提醒与目标通信号码进行通信的用户提高警惕,避免被诈骗。
实施例十
本实施例基于上述任意实施例,可以应用于需要从多个通信号码中识别出满足预设条件的通信号码的场景中,例如针对通信网络中全网号码的识别,或者,针对用户指示的待识别通信号码的识别,或者,针对与当前用户进行通信的通信号码的识别等场景中;通信的业务类型包括但不限于以下任意一种业务类型或组合:语音通话;短信;闪信;数据业务(如微信),本申请并不以此为限。
参见图12,本实施例提供的通信号码处理装置(基于话单分析的诈骗号码识别***),包括:在线识别***和离线训练***。
在线识别***是根据运营商采集的话单记录,抽取出特征;利用机器学习模型来判断某个电话号码是不是诈骗电话;然后,对被骗用户进行提醒/回访,避免用户上当受骗,将提醒/回访的结果反馈到离线训练***,据此对机器学习模型进行调整;离线训练***是利用历史话单数据以及在线识别***中提醒/回访的反馈结果,抽取出相应的特征;利用这些特征,对机器学习模型重新进行训练、调整;训练好的机器学习模型,同步更新到在线训练***中的欺诈电话识别引擎。
具体地,在线识别***根据用户通话话单记录,就可以识别出诈骗号码;在线识别***又可以分为3个模块:话单采集模块,欺诈电话识别引擎和受骗用户提醒***;其中,
话单采集模块:主要负责用户通话记录的采集,并对采集后的话单进行预处理得到下表4列信息:
表4
主叫号码 被叫号码 通话时间 通话时长(秒)
158XXXX0001 186XXXX0002 2016-01-15 15:36:42 134
001XX86 139XXXX0001 2016-01-15 15:39:02 15
138XXXX0001 139XXXX0002 2016-01-15 15:38:02 123
欺诈电话识别引擎:这是在线识别***的核心;对采集后的话单进行清洗,提取出特征,使用训练好的机器学习模型对话单抽取出的特征进行识别,判断该号码是否是诈骗电话;它又可分为3个部分:话单清洗、特征提取和诈骗号码识别;其中,
1)话单清洗就是去除话单中的“脏”数据。所谓的“脏”数据,是一些异常的数据,比如内容缺失,值异常等。
2)特征提取:对清洗后的话单,提取了一些特征,为下一步诈骗号码的识别做准备,特征包括:主叫号码的相似度,平均通话时长,相邻话单被叫号码的距离,通话间隔等。
主叫号码与黄页号码的相似度特征(即上述的通信发起号码与黄页号码的相似度):诈骗号码大都是主叫号码,诈骗分子通过***,将主叫号码改为和黄页上号码相似的号码,比如001XX86、+0109XX88,08XXX10010(***的客服电话为10010)等,计算这些号码的子串与黄页上号码的编辑距离(编辑距离表示黄页号码例如通过增加、减少、修改、移动等操作变成主叫号码的操作次数)。
单位时间内拨打次数(即上述的通信发起号码在单位时间内的通信次数):诈骗分子一般每个小时都会打很多通电话,而且这些电话大都是在工作时间,也就是周一至周五的08:00:00--18:00:00,在这个时段,拨打次数是均匀分布;非工作时段,电话的拨打次数一般很少,基本为0。
平均通话时长(即上述的平均通信时长):即诈骗号码平均每个通话的通话时长,一般用户接到诈骗电话,都会很快的挂掉电话,所以诈骗平均通话时长很短,不超过20s。
被叫号码所在的归属地在时间(单位:天)上的分布(即上述的通信发起号码所对应的通信响应号码的不同归属地的数量):诈骗分子通常是逐个城市的进行诈骗,因此,这些话单中的被叫号码通常都是属于某个城市的,将一定时间内被叫号码的归属城市个数作为该特征。
3)诈骗电话的识别:使用上述提取的特征,利用机器学习模型来识别诈骗。
受骗用户提醒***:告知诈骗通话话单中的受害用户所接收到的某通话是诈骗电话,防止受害用户上当受骗;同时将受害用户反馈的结果,是否是诈骗电话的信息提交到离线训练***。
2.离线训练***
当发现受骗用户提醒***反馈的机器学习模型的错误率高于域值时,离线训练***会提取出相关的历史话单的特征,重新训练机器学习模型,调整贝叶斯分类器(这里也可以用其他的机器学习算法,比如svm分类器、逻辑回归、深度学习等方法);离线训练***主要可分为三部分:
a)提取历史话单:提取最近一段时间的历史话单,特别是反馈结果是错误的相关话单。
b)特征提取:从历史话单中提取出特征,为下一步的模型再训练提供数据。
c)模型再训练:利用b)中提取的特征,训练贝叶斯分类器,得到新的参数,并将训练好的机器学习模型更新到在线识别***。
这样在线识别***与离线训练***就形成了一个完整的闭环,离线训练***会根据语音回访的结果,来决定是否重新训练,更新在线识别***中诈骗号码识别模型。
本实施例提供的通信号码处理装置所产生的有益效果在于:1)不需要用户的标记信息,只需要话单记录;2)加快诈骗号码的识别速度和准确性;3)可以更加准确的识别诈骗号码;实现运营商在用户通话的过程中识别诈骗电话。
实施例十一
与前述实施例的记载相对应,本实施例还记载一种通信号码处理装置,通信号码处理装置可以用于执行本申请实施例的通信号码处理方法,通信号码处理装置可以采用各种方式来实施,例如在智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等用户设备中实施装置的全部组件,或者,在企业网关、运营商网关等网络设备中实施装置的全部组件,或者,在上述的用户设备侧或网络侧以耦合的方式实施装置中的组件,或者,通信号码处理装置还可以是用户应用的客户端或者后台服务器,例如,当用户应用为腾讯手机管家时,相应的通信号码处理装置可以为腾讯手机管家的客户端或者后台服务器;参见图13,通信号码处理装置包括:
获取模块1301,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
预处理模块1302,用于解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
解析模块1303,用于解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征;
提取模块1304,用于从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
相比于需要在收集用户标记信息的基础上实施识别号码的现有技术,本实施例对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。
在上述实施例的基础上,预处理模块1302,具体用于:
解析话单得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应通信发起号码的通信响应号码、通信起始时间和通信时长;
提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;
将所提取的各通信发起号码的通信记录组合形成预处理话单。
在上述实施例的基础上,解析模块1303,具体用于:分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离;基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码中与黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码。
在上述实施例的基础上,解析模块1303,具体用于:提取预处理话单中各 通信号码作为通信发起号码的通信起始时间;计算预处理话单中各通信发起号码在单位时间内的通信次数;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。
在上述实施例的基础上,解析模块1303,具体用于:提取预处理话单中各通信号码作为通信发起号码的通信时长;计算预处理话单中各通信发起号码的平均通信时长;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。
在上述实施例的基础上,解析模块1303,具体用于:获取预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;计算预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。
在上述实施例的基础上,提取模块1304,具体用于:使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
实施例十二
本实施例基于实施例十一,参见图14,本实施例记载的通信号码处理装置也包括图13中的获取模块1301、预处理模块1302、解析模块1303及提取模块1304,并且该些功能模块也具有实施例十一所记载的相应作用,在此基础上,本实施例记载的通信号码处理装置还包括:
训练模块1305,用于接收用户侧针对目标通信号码的反馈信息,确定目标 通信号码是否为安全号码;基于识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定机器学习模型的错误率;机器学习模型的错误率大于第五阈值时,基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练。
进一步,训练模块1305,具体用于:解析预处理话单中安全号码的通信记录的至少一种类型的通信信息,得到安全号码的至少一种类型的通信信息所具有的特征;基于安全号码的至少一种类型的通信信息所具有的特征更新机器学习模型识别目标通信号码所使用的阈值。
在上述实施例的基础上,装置还包括:
响应模块1306,用于确定目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定目标通信号码的危险级别;基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理。
在实际应用中,获取模块1301、预处理模块1302、解析模块1303、提取模块1304、训练模块1305及响应模块1306,均可由位于通信号码处理装置的中央处理器(CPU)、微处理器(MPU)、专用集成电路(ASIC)或现场可编程门阵列(FPGA)等实现。
实施例十三
本实施例记载一种计算机可读介质,可以为ROM(例如,只读存储器、FLASH存储器、转移装置等)、磁存储介质(例如,磁带、磁盘驱动器等)、光学存储介质(例如,CD-ROM、DVD-ROM、纸卡、纸带等)以及其他熟知类型的程序存储器;计算机可读介质中存储有计算机可执行指令(例如腾讯视频等投射应用的二进制可执行指令),当执行指令时,引起至少一个处理器执行包括以下的操作:
从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征;
从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
综上,通信号码处理装置对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(***)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。

Claims (20)

  1. 一种通信号码处理方法,其特征在于,所述方法包括:
    从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
    解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
    解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;以及
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
  2. 根据权利要求1所述的方法,其特征在于,所述解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单,包括:
    解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;
    提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;以及
    将所提取的各通信发起号码的通信记录组合形成所述预处理话单。
  3. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
    分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;以及
    基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
    从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;
    或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码,
    其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数。
  4. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
    提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;以及
    计算所述预处理话单中各通信发起号码在单位时间内的通信次数;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
    从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;
    或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。
  5. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
    提取所述预处理话单中各通信号码作为通信发起号码的通信时长;以及
    计算所述预处理话单中各通信发起号码的平均通信时长;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
    从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;
    或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。
  6. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:
    获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;以及
    计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
    从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;
    或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。
  7. 根据权利要求1所述的方法,其特征在于,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:
    使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;
    基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;以及
    在机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。
  9. 根据权利要求8所述的方法,其特征在于,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型至进行重新训练,包括:
    解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;以及
    基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。
  10. 根据权利要求1所述的方法,其特征在于,所述从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码之后,所述方法还包括:
    确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;
    根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;以及
    基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。
  11. 一种通信号码处理装置,其特征在于,所述装置包括:
    获取模块,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;
    预处理模块,用于解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;
    解析模块,用于解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;以及
    提取模块,用于从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
  12. 根据权利要求11所述的装置,其特征在于,
    所述预处理模块,具体用于:
    解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;
    提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;以及
    将所提取的各通信发起号码的通信记录组合形成所述预处理话单。
  13. 根据权利要求11所述的装置,其特征在于,
    所述解析模块,具体用于:
    分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;以及
    基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;
    所述提取模块,具体用于:
    从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;或者,
    基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的 排序,提取出相似度最高的第一比例的通信发起号码,
    其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数。
  14. 根据权利要求11所述的装置,其特征在于,
    所述解析模块,具体用于:
    提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;以及
    计算所述预处理话单中各通信发起号码在单位时间内的通信次数;
    所述提取模块,具体用于:
    从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,
    基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。
  15. 根据权利要求11所述的装置,其特征在于,
    所述解析模块,具体用于:
    提取所述预处理话单中各通信号码作为通信发起号码的通信时长;以及
    计算所述预处理话单中各通信发起号码的平均通信时长;
    所述提取模块,具体用于:
    从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,
    基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。
  16. 根据权利要求11所述的装置,其特征在于,
    所述解析模块,具体用于:
    获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;以及
    计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;
    所述提取模块,具体用于:
    从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,
    基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同 归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。
  17. 根据权利要求11所述的装置,其特征在于,
    所述提取模块,具体用于:使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。
  18. 根据权利要求17所述的装置,其特征在于,所述装置还包括:
    训练模块,用于:
    接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;
    基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;以及
    在机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。
  19. 根据权利要求18所述的装置,其特征在于,
    所述训练模块,具体用于:
    解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;以及
    基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。
  20. 根据权利要求11所述的装置,其特征在于,所述装置还包括:
    响应模块,用于确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;以及基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。
PCT/CN2017/081813 2016-04-25 2017-04-25 通信号码处理方法及装置 WO2017186090A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610261923.1A CN107306306B (zh) 2016-04-25 2016-04-25 通信号码处理方法及装置
CN201610261923.1 2016-04-25

Publications (1)

Publication Number Publication Date
WO2017186090A1 true WO2017186090A1 (zh) 2017-11-02

Family

ID=60150219

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/081813 WO2017186090A1 (zh) 2016-04-25 2017-04-25 通信号码处理方法及装置

Country Status (2)

Country Link
CN (1) CN107306306B (zh)
WO (1) WO2017186090A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887491A (zh) * 2019-11-29 2021-06-01 中国电信股份有限公司 用户缺失信息获取方法和装置
CN113206909A (zh) * 2021-04-30 2021-08-03 中国银行股份有限公司 骚扰电话拦截方法及装置
CN114745211A (zh) * 2022-04-26 2022-07-12 贵阳朗玛通信科技有限公司 一种基于话单数据快速匹配策略的方法和装置
CN114745211B (zh) * 2022-04-26 2024-06-25 贵阳朗玛通信科技有限公司 一种基于话单数据快速匹配策略的方法和装置

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108124065A (zh) * 2017-12-05 2018-06-05 浙江鹏信信息科技股份有限公司 一种对垃圾电话内容进行识别与处置的方法
CN109963276A (zh) * 2017-12-26 2019-07-02 恒为科技(上海)股份有限公司 一种话单数据处理方法及装置
CN108391223B (zh) * 2018-02-12 2020-08-11 中国联合网络通信集团有限公司 一种确定失联用户的方法及装置
CN110401779B (zh) * 2018-04-24 2022-02-01 ***通信集团有限公司 一种识别电话号码的方法、装置和计算机可读存储介质
CN109474755B (zh) * 2018-10-30 2020-10-30 济南大学 基于排序学习和集成学习的异常电话主动预测方法、***及计算机可读存储介质
CN110087230B (zh) * 2019-04-26 2020-09-15 同盾控股有限公司 数据处理方法、装置、存储介质及电子设备
CN111031546B (zh) * 2019-11-29 2023-09-19 武汉烽火众智数字技术有限责任公司 一种应用于电话号码分析的lr模型训练方法及使用方法
CN111131627B (zh) * 2019-12-20 2021-12-07 珠海高凌信息科技股份有限公司 基于流数据图谱的个人有害呼叫检测方法、装置及可读介质
CN113596260B (zh) * 2020-04-30 2022-12-16 ***通信集团广东有限公司 异常电话号码检测方法和电子设备
CN111783968B (zh) * 2020-06-30 2024-05-31 山东信通电子股份有限公司 一种基于云边协同的输电线路监测方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217820A (zh) * 2008-01-15 2008-07-09 中兴通讯股份有限公司 一种骚扰号码的识别***及识别方法
CN101426203A (zh) * 2007-11-02 2009-05-06 华为技术有限公司 一种识别恶意骚扰电话的方法和设备
EP2278783A1 (de) * 2009-06-26 2011-01-26 Vodafone Holding GmbH Vorrichtung und Verfahren zum Erkennen von erwünschten und/oder unerwünschten Telefonanrufen in Abhängigkeit des Nutzerverhaltens eines Nutzers eines Telefons
CN102892117A (zh) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 一种骚扰***控***方法及***
CN105451234A (zh) * 2015-11-09 2016-03-30 北京市天元网络技术股份有限公司 一种基于信令交互数据的可疑号码分析方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101426203A (zh) * 2007-11-02 2009-05-06 华为技术有限公司 一种识别恶意骚扰电话的方法和设备
CN101217820A (zh) * 2008-01-15 2008-07-09 中兴通讯股份有限公司 一种骚扰号码的识别***及识别方法
EP2278783A1 (de) * 2009-06-26 2011-01-26 Vodafone Holding GmbH Vorrichtung und Verfahren zum Erkennen von erwünschten und/oder unerwünschten Telefonanrufen in Abhängigkeit des Nutzerverhaltens eines Nutzers eines Telefons
CN102892117A (zh) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 一种骚扰***控***方法及***
CN105451234A (zh) * 2015-11-09 2016-03-30 北京市天元网络技术股份有限公司 一种基于信令交互数据的可疑号码分析方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887491A (zh) * 2019-11-29 2021-06-01 中国电信股份有限公司 用户缺失信息获取方法和装置
CN112887491B (zh) * 2019-11-29 2023-03-21 中国电信股份有限公司 用户缺失信息获取方法和装置
CN113206909A (zh) * 2021-04-30 2021-08-03 中国银行股份有限公司 骚扰电话拦截方法及装置
CN114745211A (zh) * 2022-04-26 2022-07-12 贵阳朗玛通信科技有限公司 一种基于话单数据快速匹配策略的方法和装置
CN114745211B (zh) * 2022-04-26 2024-06-25 贵阳朗玛通信科技有限公司 一种基于话单数据快速匹配策略的方法和装置

Also Published As

Publication number Publication date
CN107306306A (zh) 2017-10-31
CN107306306B (zh) 2020-04-07

Similar Documents

Publication Publication Date Title
WO2017186090A1 (zh) 通信号码处理方法及装置
CN106791220B (zh) 防止电话诈骗的方法及***
CN109429230B (zh) 一种通信诈骗识别方法及***
CN109600752B (zh) 一种深度聚类诈骗检测的方法和装置
CN107566358B (zh) 一种风险预警提示方法、装置、介质及设备
CN106384273B (zh) 恶意刷单检测***及方法
US9544431B2 (en) System and method for intelligent call blocking with block mode
CN108243049B (zh) 电信欺诈识别方法及装置
CN109995929B (zh) 操作和账号信息的处理方法及装置
CN104883671B (zh) 一种垃圾短信的判断方法及***
CN105589845B (zh) 垃圾文本识别方法、装置及***
CN102438205B (zh) 一种基于移动用户行为的业务推送的方法与***
CN105045911B (zh) 一种用于用户进行标记的标签生成方法及设备
CN110705926A (zh) 一种物流对象配送信息的获取方法、装置和***
CN110611929A (zh) 异常用户识别方法及装置
WO2017035945A1 (zh) 一种对通话呼叫者进行标记的方法、装置及***
US20230209351A1 (en) Assessing risk of fraud associated with user unique identifier using telecommunications data
CN105701224B (zh) 一种基于大数据的证券资讯个性化服务***
CN111131627B (zh) 基于流数据图谱的个人有害呼叫检测方法、装置及可读介质
CN113206909A (zh) 骚扰电话拦截方法及装置
CN110113748B (zh) 骚扰***控方法、装置
CN109474755B (zh) 基于排序学习和集成学习的异常电话主动预测方法、***及计算机可读存储介质
KR20170006158A (ko) 문자 메시지 부정 사용 탐지 방법 및 시스템
CN107172622A (zh) 伪基站短信的识别和分析方法、装置及***
CN111105064A (zh) 确定欺诈事件的嫌疑信息的方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17788742

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17788742

Country of ref document: EP

Kind code of ref document: A1