CN109299084B - User portrait data filtering method and device - Google Patents

User portrait data filtering method and device Download PDF

Info

Publication number
CN109299084B
CN109299084B CN201811246906.6A CN201811246906A CN109299084B CN 109299084 B CN109299084 B CN 109299084B CN 201811246906 A CN201811246906 A CN 201811246906A CN 109299084 B CN109299084 B CN 109299084B
Authority
CN
China
Prior art keywords
imei
encrypted
data
user
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811246906.6A
Other languages
Chinese (zh)
Other versions
CN109299084A (en
Inventor
钱佳
曹文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201811246906.6A priority Critical patent/CN109299084B/en
Publication of CN109299084A publication Critical patent/CN109299084A/en
Application granted granted Critical
Publication of CN109299084B publication Critical patent/CN109299084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)
  • Storage Device Security (AREA)

Abstract

The disclosure relates to a user portrait data filtering method and device. The method comprises the following steps: splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; generating more than two batches of imei based on each TAC in batches according to at least two predetermined TACs of different models, and searching first encrypted imei corresponding to each batch of imei in first user portrait data; screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei; the dirty data is filtered from the first user representation data to obtain second user representation data. The method and the device can improve the data quality and improve the accuracy and reliability of information pushing.

Description

User portrait data filtering method and device
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a user portrait data filtering method and apparatus.
Background
At present, in the field of data statistics, because of different statistical dimensions, the correlation between two ids, namely a user identifier (id) and a device id, is necessarily involved, and the push faces massive user id data and complex statistical logic, and the mapping integration of different ids needs to be carried out on a data cleaning and data analysis layer.
In the related art, data quality is improved by performing associated mapping and aggregation on various user ids and device ids.
Disclosure of Invention
To overcome the problems in the related art, embodiments of the present disclosure provide a method and an apparatus for filtering user portrait data. The technical scheme is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided a method for filtering user portrait data, the method including:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
generating more than two batch international mobile equipment identification codes imei based on each TAC in batches according to at least two predetermined model approval numbers TAC of different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
filtering the dirty data from the first user representation data to obtain second user representation data.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: according to the technical scheme, batches of imei based on TAC of multiple models are generated in batches, imei MD5 corresponding to each batch of imei is found from first user portrait data, then dirty data in imei MD5 corresponding to each batch of imei are screened out according to the TAC used for generating each batch of imei and model information related to imei MD5 corresponding to each batch of imei, further dirty data are filtered from the first user portrait data, data quality is improved, and accuracy and reliability of information pushing are improved.
In one embodiment, said searching for a first encrypted imei in said first user portrait data corresponding to each of said batches of imei comprises:
encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
matching the second encrypted imei with the encrypted device identification in the first user portrait data;
and determining the encrypted device identification in the first user portrait data that matches the second encrypted imei as the first encrypted imei in the first user portrait data that corresponds to each batch of imei.
In one embodiment, the screening out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei includes:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
In one embodiment, the splitting and re-aggregating each dimension of the to-be-processed user representation data with the user identification as a key separately to generate first user representation data with the device tag as a key comprises:
splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
iteratively executing the steps of splitting and re-aggregating for multiple times to obtain respective corresponding values of the encrypted equipment identifications;
generating first user portrait data with an equipment label as a key word according to the respective corresponding value of each encrypted equipment identifier; the device tag corresponds to an encrypted device identification having the same value.
In one embodiment, the encrypted device identifier includes: encrypted imei, encrypted mobile equipment identification code meid and encrypted media access control.
According to a second aspect of embodiments of the present disclosure, there is provided a user representation data filtering apparatus, the apparatus comprising:
the aggregation module is used for independently splitting and re-aggregating each dimensionality of the to-be-processed user portrait data with the user identification as the keyword to generate first user portrait data with the equipment label as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
the generation module is used for generating more than two batches of international mobile equipment identification codes imei based on each TAC in batches according to predetermined model approval numbers TAC of at least two different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
the screening module is used for screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
a filter module to filter the dirty data from the first user representation data to obtain second user representation data.
In one embodiment, the generating module includes:
the encryption submodule is used for encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
a matching sub-module, configured to match the second encrypted imei with the encrypted device identifier in the first user portrait data;
a first determining sub-module configured to determine an encrypted device identifier in the first user portrait data that matches the second encrypted imei as a first encrypted imei in the first user portrait data that corresponds to each of the batches of imei.
In one embodiment, the screening module includes:
the judgment sub-module is used for judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei or not;
and the second determining submodule is used for determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
In one embodiment, the aggregation module includes:
the aggregation submodule is used for splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
an iteration submodule, configured to iteratively execute the splitting and re-aggregating step multiple times to obtain a value corresponding to each encrypted device identifier;
a generation submodule, configured to generate first user portrait data using an apparatus tag as a keyword according to a value corresponding to each encrypted apparatus identifier; the device tag corresponds to an encrypted device identification having the same value.
According to a third aspect of the disclosed embodiments, there is provided a user representation data filtering apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
generating more than two batch international mobile equipment identification codes imei based on each TAC in batches according to at least two predetermined model approval numbers TAC of different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
filtering the dirty data from the first user representation data to obtain second user representation data.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method embodiments of any one of the above-mentioned first aspects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment.
FIG. 4 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment.
FIG. 5 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment.
FIG. 6 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment.
FIG. 7 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment.
FIG. 8 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment.
Fig. 9 is a block diagram illustrating an apparatus according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the related art, various user ids and device ids are subjected to associated mapping and aggregation, so that the data quality is improved. However, the related art only performs associated mapping and aggregation on various user ids and device ids, but cannot ensure the correctness of aggregation, and cannot filter out the information-summary Algorithm 5(MD5, Message-Digest Algorithm 5) of the wrong International Mobile Equipment identity (imei) information, which still has poor data quality, resulting in poor accuracy and reliability of information push.
In order to solve the above problem, an embodiment of the present disclosure provides a user portrait data filtering method, where the method includes: splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user portrait data to be processed comprises a user identification and an encrypted equipment identification; generating more than two batches of imei based on each TAC in batches according to predetermined model approval numbers (TACs) of at least two different models, and searching first encrypted imei corresponding to each batch of imei in first user portrait data; screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei; the dirty data is filtered from the first user representation data to obtain second user representation data. According to the method and the device for pushing the IMEI, batch imei based on TAC of multiple models is generated in batches, the imei MD5 corresponding to each batch imei is found from the first user portrait data, then the dirty data in the imei MD5 corresponding to each batch imei are screened out according to the TAC used for generating each batch imei and the model information related to the imei MD5 corresponding to each batch imei, the dirty data are filtered from the first user portrait data, data quality is improved, and accuracy and reliability of information pushing are improved.
Based on the above analysis, embodiments of the disclosed method are described below.
FIG. 1 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment; the execution subject of the method can be a server; as shown in fig. 1, the method comprises the following steps 101-104:
in step 101, each dimension of the user portrait data to be processed, which takes the user identifier as a keyword, is independently split and re-aggregated, so as to generate first user portrait data which takes the device tag as a keyword; the user portrait data to be processed comprises a user identification and an encrypted device identification.
For example, the encrypted mobile device identification code includes: encrypted imei, and encrypted Mobile Equipment identity (meid). The encrypted imei is, for example, imeiMd5, the encrypted meid is, for example, meidMd5, and the encrypted media access control is, for example, macMd 5.
Exemplary, to-be-processed user representation data includes: new user image data and historical user image data. For example, newly added user portrait data in the log may be stored as a key by user identification (UUID) every day, and then combined with historical user portrait data to form a whole user portrait with UUID as a key, where the user portrait data to be processed includes UUID, imeiMd5, meid, and macMd 5. Optionally, the user profile data to be processed may further include an International Mobile Subscriber Identity (IMSI). It should be noted that, for the reason of protecting the privacy of the user, the server cannot directly collect imei of the device, and only can collect encrypted imei and store the encrypted imei, such as imeiMd 5.
For example, the dimensions of the user portrait data to be processed may include user identification, encrypted mobile device identification, and encrypted media access control. In an example, the encrypted device identifications are used as keywords to split and reunite the user portrait data to be processed; iteratively executing the steps of splitting and re-aggregating for multiple times to obtain respective values corresponding to the encrypted equipment identifications; generating first user portrait data with the equipment label as a keyword according to the respective corresponding value of each encrypted equipment identifier; the device tag corresponds to an encrypted device identification having the same value. In an embodiment, the number of times of the iterative performing of the splitting and re-aggregating steps is, for example, two, but the embodiments of the present disclosure are not limited thereto. After three rounds of scattering and aggregation processing, each encrypted device identifier has a value after aggregation, and the device label and the encrypted device identifier with the same value establish a one-to-many mapping relation; therefore, the user portrait data to be processed is split and re-aggregated, and the first user portrait data with the equipment label as the key word is generated after three iterations.
It should be noted that, in general, a dual-card dual-standby device may have 2 imeimmd 5 and 1 meidm 5, in which case, at most, 3 rounds of aggregation are required to realize the conversion of the data storage form of the user portrait data.
In step 102, more than two batches of imei based on each TAC are generated in batches according to predetermined TACs of at least two different models, and the first encrypted imei corresponding to each batch of imei in the first user portrait data is searched.
For example, for the devices on the market, the first ten brands occupy most of the share of the newly added devices, so that the data quality can be greatly improved by effectively cleaning the data corresponding to the devices of the first ten brands. Since the model of the device is usually collected during data collection, information such as the associated model and brand can be found in the first user portrait data through imeim 5. Meanwhile, the IMEI is composed of TAC, SNR, SP and the like, wherein the TAC is the first 8 digits and can uniquely identify one type of machine type; the SNR is the last 6 digits and can identify the production sequence number; the SP is reserved for use. The TACs corresponding to the IMEIs can be obtained by analyzing the IMEIs acquired from the public channels, so that the TACs existing in the de-enzyming in the market can be obtained, and the TACs are used as the predetermined TACs of at least two different models. From these TACs, more than two batches imei based on each TAC were generated in batches. Alternatively, TAC of the model of the top ten brands may be directly obtained from a network or a specific social organization (for example, an organization responsible for allocating TAC).
Then, the first user portrait data is searched for first encrypted imei corresponding to each batch of imei. For example, encrypting each batch of imei respectively to obtain a second encrypted imei corresponding to each batch of imei respectively; matching the second encrypted imei with the encrypted device identification in the first user portrait data; and determining the encrypted equipment identification matched with the second encrypted imei in the first user portrait data as the first encrypted imei corresponding to each batch of imei in the first user portrait data, and searching model information associated with each first encrypted imei in the first user portrait data.
In step 103, dirty data in each first encrypted imei is screened out according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei.
Exemplarily, judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei; and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei. For example, each TAC uniquely identifies one model, and taking TAC1 as an example, more than two batches of imei based on TAC1 are generated in batches, and imei Md5 and model information corresponding to the batches of imei in the first user portrait data are searched; considering that the dirty data is a few, the model information 1 corresponding to the TAC1 is determined based on the same model information corresponding to most imei of the TAC 1. If the model information corresponding to imei1 based on TAC1 is not model information 1, then it may be determined that imei Md5 corresponding to imei1 in the first user portrait data is incorrect.
Examples are as follows: assuming that 35226005 most of models corresponding to the TAC are models a of the equipment provider a, it is found that 35226005 models corresponding to a certain number of imeim 5 corresponding to the TAC are models B of the equipment provider B, which indicates that imeim 5 of the machines is problematic, and may be flushed and need to be treated as dirty data.
In step 104, dirty data is filtered from the first user representation data to obtain second user representation data.
Optionally, the dirty data may be separately extracted and stored, on one hand, the dirty data may be used for later problem investigation and backtracking, and on the other hand, the accumulated dirty data may be used in the fields of black product identification and the like.
According to the technical scheme provided by the embodiment of the disclosure, by generating batches of imei based on TAC of multiple models in batches, finding imei MD5 corresponding to each batch of imei from first user portrait data, screening out dirty data in imei MD5 corresponding to each batch of imei according to the TAC used for generating each batch of imei and model information associated with imei MD5 corresponding to each batch of imei, and further filtering out dirty data from the first user portrait data, the data quality can be improved, and therefore, the accuracy and reliability of information pushing can be improved.
FIG. 2 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment. As shown in fig. 2, on the basis of the embodiment shown in fig. 1, the user portrait data filtering method according to the present disclosure may include the following steps 201 and 205:
in step 201, each dimension of the user portrait data to be processed, which takes the user identifier as a keyword, is independently split and re-aggregated, so as to generate first user portrait data which takes the device tag as a keyword; the user portrait data to be processed comprises a user identification and an encrypted device identification.
In step 202, more than two batches of imei based on each TAC are generated in batches according to at least two predetermined TACs of different models, and the first encrypted imei corresponding to each batch of imei in the first user portrait data is searched.
In step 203, it is determined whether the TAC used to generate each batch of imei matches the model information associated with each first encrypted imei.
In step 204, the encrypted imei in which the model information associated with each first encrypted imei and the TAC used for generating each batch of imei do not match is determined as dirty data in each first encrypted imei.
In step 205, dirty data is filtered from the first user representation data to obtain second user representation data.
According to the technical scheme provided by the embodiment of the disclosure, the TAC used for generating each batch of imei is matched with the model information associated with the imei MD5 corresponding to each batch of imei, dirty data in the imei MD5 corresponding to each batch of imei is screened out, the dirty data is filtered from the first user portrait data, and the data quality is improved.
FIG. 3 is a flow diagram illustrating a method of user representation data filtering in accordance with an exemplary embodiment. As shown in fig. 3, on the basis of the embodiment shown in fig. 1, the user portrait data filtering method according to the present disclosure may include the following steps 301-:
in step 301, each dimension of the user portrait data to be processed, which takes the user identifier as a keyword, is separately split and re-aggregated, so as to generate first user portrait data which takes the device tag as a keyword;
the user portrait data to be processed comprises a user identification and an encrypted device identification.
In step 302, more than two batches imei based on each TAC are generated in batches according to predetermined TACs of at least two different models.
In step 303, each batch of imei is encrypted, so as to obtain a second encrypted imei corresponding to each batch of imei.
In step 304, matching the second encrypted imei with the encrypted device identification in the first user representation data; and determining the encrypted equipment identification matched with the second encrypted imei in the first user portrait data as the first encrypted imei corresponding to each batch of imei in the first user portrait data.
In step 305, it is determined whether the TAC used to generate each batch of imei matches the model information associated with each first encrypted imei.
In step 306, the encrypted imei in which the model information associated with each first encrypted imei and the TAC used for generating each batch of imei do not match is determined as dirty data in each first encrypted imei.
In step 307, dirty data is filtered from the first user representation data to obtain second user representation data.
According to the technical scheme provided by the embodiment of the disclosure, the batches of imei generated in batches are respectively encrypted to obtain second encrypted imei corresponding to the batches of imei, the second encrypted imei is matched with the encrypted equipment identifier in the first user portrait data, so that a mapping relation of TAC-batch imei-second encrypted imei-first encrypted imei-machine type can be established, and further the wrong imei MD5 in the first user portrait data is identified and filtered by using the mapping relation of TAC and machine type, so that the data quality is improved, and the accuracy and reliability of information pushing can be improved.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
FIG. 4 is a block diagram illustrating a user representation data filtering apparatus in accordance with an exemplary embodiment; the apparatus may be implemented in various ways, for example with all components of the apparatus being implemented in a server or with components of the apparatus being implemented in a coupled manner on the server side; the apparatus may implement the method related to the present disclosure through software, hardware or a combination of the two;
as shown in FIG. 4, the user representation data filtering apparatus includes: an aggregation module 401, a generation module 402, a screening module 403, and a filtering module 404, wherein:
the aggregation module 401 is configured to separately split and re-aggregate each dimension of the to-be-processed user representation data with the user identification as a keyword, generating first user representation data with the device tag as a keyword; the user portrait data to be processed comprises a user identification and an encrypted equipment identification;
the generating module 402 is configured to generate more than two batches of imei based on each TAC in batches according to predetermined TACs of at least two different models, and search for a first encrypted imei corresponding to each batch of imei in the first user portrait data;
the screening module 403 is configured to screen out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei;
filter module 404 is configured to filter dirty data from the first user representation data to obtain second user representation data.
The device provided by the embodiment of the disclosure can be used for executing the technical scheme of the embodiment shown in fig. 1, and the execution mode and the beneficial effect are similar, and are not described again here.
In one possible implementation, as shown in FIG. 5, the user representation data filtering apparatus shown in FIG. 4 may further include a generating module 402 configured to include: an encryption submodule 501, a matching submodule 502 and a first determination submodule 503, wherein:
the encryption submodule 501 is configured to encrypt each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
matching sub-module 502 is configured to match the second encrypted imei with the encrypted device identification in the first user portrait data;
the first determination submodule 503 is configured to determine the encrypted device identification in the first user representation data that matches the second encrypted imei as a first encrypted imei in the first user representation data corresponding to each batch of imei.
In one possible implementation, as shown in FIG. 6, the user representation data filtering apparatus shown in FIG. 4 may further include a filter module 403 configured to include: a judgment sub-module 601 and a second determination sub-module 602, wherein:
the judgment sub-module 601 is configured to judge whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
the second determining sub-module 602 is configured to determine the encrypted imei in which the model information associated in each first encrypted imei does not match the TAC used to generate each batch of imei, as the dirty data in each first encrypted imei.
In one possible implementation, as shown in FIG. 7, the user representation data filtering apparatus shown in FIG. 4 may further include configuring aggregation module 401 to include: an aggregation sub-module 701, an iteration sub-module 702, and a generation sub-module 703, wherein:
the aggregation submodule 701 is configured to split and re-aggregate the user portrait data to be processed by respectively taking each encrypted device identifier as a keyword;
the iteration submodule 702 is configured to perform iteration of the splitting and re-aggregating steps multiple times to obtain respective values corresponding to the encrypted device identifiers;
the generation submodule 703 is configured to generate first user portrait data with the device tag as a keyword according to a value corresponding to each encrypted device identifier; the device tag corresponds to an encrypted device identification having the same value.
In an embodiment, the number of times of the iterative performing of the splitting and re-aggregating steps is, for example, two, but the embodiments of the present disclosure are not limited thereto.
FIG. 8 is a block diagram illustrating a user representation data filtering apparatus 800 according to an exemplary embodiment, the user representation data filtering apparatus 800 being adapted for use with a server, the user representation data filtering apparatus 800 comprising:
a processor 801;
a memory 802 for storing processor-executable instructions;
wherein the processor 801 is configured to:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user portrait data to be processed comprises a user identification and an encrypted equipment identification;
generating more than two batches of imei based on each TAC in batches according to at least two predetermined TACs of different models, and searching first encrypted imei corresponding to each batch of imei in first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
the dirty data is filtered from the first user representation data to obtain second user representation data.
In one embodiment, the processor 801 may be further configured to:
encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
matching the second encrypted imei with the encrypted device identification in the first user portrait data;
and determining the encrypted equipment identification matched with the second encrypted imei in the first user portrait data as the first encrypted imei corresponding to each batch of imei in the first user portrait data.
In one embodiment, the processor 801 may be further configured to:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
In one embodiment, the processor 801 may be further configured to:
splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
iteratively executing the steps of splitting and re-aggregating for multiple times to obtain respective values corresponding to the encrypted equipment identifications;
generating first user portrait data with the equipment label as a keyword according to the respective corresponding value of each encrypted equipment identifier; the device tag corresponds to an encrypted device identification having the same value.
In an embodiment, the number of times of the iterative performing of the splitting and re-aggregating steps is, for example, two, but the embodiments of the present disclosure are not limited thereto.
In one embodiment, the encrypted device identification includes: encrypted imei, encrypted mobile equipment identification code meid and encrypted media access control.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 9 is a block diagram illustrating an apparatus in accordance with an example embodiment. For example, the apparatus 900 may be provided as a server. The apparatus 900 comprises a processing component 902 further comprising one or more processors, and memory resources, represented by memory 903, for storing instructions, e.g., applications, executable by the processing component 902. The application programs stored in memory 903 may include one or more modules that each correspond to a set of instructions. Further, the processing component 902 is configured to execute instructions to perform the above-described methods.
The device 900 may also include a power component 906 configured to perform power management of the device 900, a wired or wireless network interface 905 configured to connect the device 900 to a network, and an input/output (I/O) interface 908. The apparatus 900 may operate based on an operating system stored in the memory 903, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
A non-transitory computer readable storage medium, instructions in the storage medium, when executed by a processor of an apparatus 900, enable the apparatus 900 to perform a method comprising:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user portrait data to be processed comprises a user identification and an encrypted equipment identification;
generating more than two batches of imei based on each TAC in batches according to at least two predetermined TACs of different models, and searching first encrypted imei corresponding to each batch of imei in first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
the dirty data is filtered from the first user representation data to obtain second user representation data.
In one embodiment, searching for a first encrypted imei in the first user representation data corresponding to each batch of imei comprises:
encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
matching the second encrypted imei with the encrypted device identification in the first user portrait data;
and determining the encrypted equipment identification matched with the second encrypted imei in the first user portrait data as the first encrypted imei corresponding to each batch of imei in the first user portrait data.
In one embodiment, screening out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei includes:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
In one embodiment, splitting and re-aggregating each dimension of the to-be-processed user representation data keyed to the user identification separately to generate first user representation data keyed to the device tag, comprises:
splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
iteratively executing the steps of splitting and re-aggregating for multiple times to obtain respective values corresponding to the encrypted equipment identifications;
generating first user portrait data with the equipment label as a keyword according to the respective corresponding value of each encrypted equipment identifier; the device tag corresponds to an encrypted device identification having the same value.
In an embodiment, the number of times of the iterative performing of the splitting and re-aggregating steps is, for example, two, but the embodiments of the present disclosure are not limited thereto.
In one embodiment, the encrypted device identification includes: encrypted imei, encrypted mobile equipment identification code meid and encrypted media access control.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for filtering user portrait data, comprising:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
generating more than two batch international mobile equipment identification codes imei based on each TAC in batches according to at least two predetermined model approval numbers TAC of different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
filtering the dirty data from the first user representation data to obtain second user representation data,
the screening out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei comprises:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
2. The method of claim 1, wherein said searching for a first encrypted imei in said first user portrait data corresponding to each said batch of imei, comprises:
encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
matching the second encrypted imei with the encrypted device identification in the first user portrait data;
and determining the encrypted device identification in the first user portrait data that matches the second encrypted imei as the first encrypted imei in the first user portrait data that corresponds to each batch of imei.
3. The method of claim 1, wherein splitting and re-aggregating each dimension of the user representation data to be processed keyed to a user identification separately to generate first user representation data keyed to a device tag comprises:
splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
iteratively executing the steps of splitting and re-aggregating for multiple times to obtain respective corresponding values of the encrypted equipment identifications;
generating first user portrait data with an equipment label as a key word according to the respective corresponding value of each encrypted equipment identifier; the device tag corresponds to an encrypted device identification having the same value.
4. The method of claim 1, wherein the encrypted device identifier comprises: encrypted imei, encrypted mobile equipment identification code meid and encrypted media access control.
5. A user representation data filtering apparatus, comprising:
the aggregation module is used for independently splitting and re-aggregating each dimensionality of the to-be-processed user portrait data with the user identification as the keyword to generate first user portrait data with the equipment label as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
the generation module is used for generating more than two batches of international mobile equipment identification codes imei based on each TAC in batches according to predetermined model approval numbers TAC of at least two different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
the screening module is used for screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
a filter module to filter the dirty data from the first user representation data to obtain second user representation data,
the screening out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei comprises:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
6. The apparatus of claim 5, wherein the generating module comprises:
the encryption submodule is used for encrypting each batch of imei respectively to obtain second encrypted imei corresponding to each batch of imei respectively;
a matching sub-module, configured to match the second encrypted imei with the encrypted device identifier in the first user portrait data;
a first determining sub-module configured to determine an encrypted device identifier in the first user portrait data that matches the second encrypted imei as a first encrypted imei in the first user portrait data that corresponds to each of the batches of imei.
7. The apparatus of claim 5, wherein the screening module comprises:
the judgment sub-module is used for judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei or not;
and the second determining submodule is used for determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
8. The apparatus of claim 5, wherein the aggregation module comprises:
the aggregation submodule is used for splitting and re-aggregating the user portrait data to be processed by respectively taking each encrypted equipment identifier as a keyword;
an iteration submodule, configured to iteratively execute the splitting and re-aggregating step multiple times to obtain a value corresponding to each encrypted device identifier;
a generation submodule, configured to generate first user portrait data using an apparatus tag as a keyword according to a value corresponding to each encrypted apparatus identifier; the device tag corresponds to an encrypted device identification having the same value.
9. A user representation data filtering apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
splitting and re-aggregating each dimension of the user portrait data to be processed with the user identification as the keyword independently to generate first user portrait data with the equipment tag as the keyword; the user image data to be processed comprises the user identification and the encrypted equipment identification;
generating more than two batch international mobile equipment identification codes imei based on each TAC in batches according to at least two predetermined model approval numbers TAC of different models, and searching first encrypted imei corresponding to each batch imei in the first user portrait data;
screening out dirty data in each first encrypted imei according to TAC used for generating each batch of imei and model information associated with each first encrypted imei;
filtering the dirty data from the first user representation data to obtain second user representation data,
the screening out dirty data in each first encrypted imei according to the TAC used for generating each batch of imei and the model information associated with each first encrypted imei comprises:
judging whether TAC used for generating each batch of imei is matched with model information associated with each first encrypted imei;
and determining the model information associated in each first encrypted imei and the encrypted imei which does not match the TAC used for generating each batch of imei as dirty data in each first encrypted imei.
10. A computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 4.
CN201811246906.6A 2018-10-24 2018-10-24 User portrait data filtering method and device Active CN109299084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811246906.6A CN109299084B (en) 2018-10-24 2018-10-24 User portrait data filtering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811246906.6A CN109299084B (en) 2018-10-24 2018-10-24 User portrait data filtering method and device

Publications (2)

Publication Number Publication Date
CN109299084A CN109299084A (en) 2019-02-01
CN109299084B true CN109299084B (en) 2022-04-01

Family

ID=65157832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811246906.6A Active CN109299084B (en) 2018-10-24 2018-10-24 User portrait data filtering method and device

Country Status (1)

Country Link
CN (1) CN109299084B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102469442A (en) * 2010-11-15 2012-05-23 ***通信集团山东有限公司 Method and device for determining service supported by user terminal
CN103246980A (en) * 2012-02-02 2013-08-14 阿里巴巴集团控股有限公司 Information output method and server
CN104750752A (en) * 2013-12-31 2015-07-01 ***通信集团公司 Determination method and device of user community with internet-surfing preference
CN105677723A (en) * 2015-12-30 2016-06-15 合肥城市云数据中心股份有限公司 Method for establishing and searching data labels for industrial signal source
CN106227748A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information generating method and equipment
CN106960143A (en) * 2017-03-23 2017-07-18 网易(杭州)网络有限公司 The recognition methods of user account and device, storage medium, electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102469442A (en) * 2010-11-15 2012-05-23 ***通信集团山东有限公司 Method and device for determining service supported by user terminal
CN103246980A (en) * 2012-02-02 2013-08-14 阿里巴巴集团控股有限公司 Information output method and server
CN104750752A (en) * 2013-12-31 2015-07-01 ***通信集团公司 Determination method and device of user community with internet-surfing preference
CN105677723A (en) * 2015-12-30 2016-06-15 合肥城市云数据中心股份有限公司 Method for establishing and searching data labels for industrial signal source
CN106227748A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information generating method and equipment
CN106960143A (en) * 2017-03-23 2017-07-18 网易(杭州)网络有限公司 The recognition methods of user account and device, storage medium, electronic equipment

Also Published As

Publication number Publication date
CN109299084A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109257764B (en) User portrait data processing method and device
CN106326219B (en) Method, device and system for checking business system data
US9781109B2 (en) Method, terminal device, and network device for improving information security
US9066226B2 (en) Initialization of embedded secure elements
US20180324735A1 (en) Bluetooth automatic connection method, and master device, slave device, and system
CN109657107B (en) Terminal matching method and device based on third-party application
CN112602304A (en) Identifying device types based on behavioral attributes
US9928055B1 (en) Validating development software by comparing results from processing historic data sets
CN107294924B (en) Vulnerability detection method, device and system
CN111177481B (en) User identifier mapping method and device
CN104408118A (en) Database establishing method and device
CN111815467A (en) Auditing method and device
CN114218322A (en) Data display method, device, equipment and medium based on ciphertext transmission
CN109299084B (en) User portrait data filtering method and device
CN114363029A (en) Differentiated network access authentication method, device, equipment and medium
CN105207829B (en) Intrusion detection data processing method, device and system
CN110968572B (en) User portrait data cleaning method and device
CN116567609A (en) User information association backfill method, device, equipment and storage medium
US10003492B2 (en) Systems and methods for managing data related to network elements from multiple sources
CN110968573B (en) User portrait data cleaning method and device
CN107548058B (en) Equipment access method and intelligent terminal
CN105893445A (en) Data processing method, server and terminal device
CN108737350B (en) Information processing method and client
CN113297583B (en) Vulnerability risk analysis method, device, equipment and storage medium
CN112148724B (en) Equipment identification processing method and system, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant