CN113468389B - User portrait establishment method and device based on feature sequence comparison - Google Patents

User portrait establishment method and device based on feature sequence comparison Download PDF

Info

Publication number
CN113468389B
CN113468389B CN202010237633.XA CN202010237633A CN113468389B CN 113468389 B CN113468389 B CN 113468389B CN 202010237633 A CN202010237633 A CN 202010237633A CN 113468389 B CN113468389 B CN 113468389B
Authority
CN
China
Prior art keywords
sequence
feature
structural feature
structural
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010237633.XA
Other languages
Chinese (zh)
Other versions
CN113468389A (en
Inventor
刘毅
董云龙
曹英卓
陈广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Hebei Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Hebei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Hebei Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010237633.XA priority Critical patent/CN113468389B/en
Publication of CN113468389A publication Critical patent/CN113468389A/en
Application granted granted Critical
Publication of CN113468389B publication Critical patent/CN113468389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90348Query processing by searching ordered data, e.g. alpha-numerically ordered data

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user portrait establishing method and device based on feature sequence comparison, wherein the method comprises the steps of collecting feature sets corresponding to different users in different time periods, determining a plurality of structural feature sequences and a plurality of time sequence feature sequences, and respectively encoding to obtain a plurality of structural feature encoding sequences and a plurality of time sequence feature encoding sequences; comparing any structural feature coding sequence with other structural feature coding sequences, determining an associated structural feature coding sequence, and combining the structural feature coding sequences to obtain a structural feature combined sequence; determining a time sequence feature coding sequence corresponding to the structural feature merging sequence; and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence. The invention distinguishes individuals based on characteristic sequence comparison, overcomes the defect that users are difficult to accurately and uniquely identify when user figures are constructed in the prior art, cannot form complete user figures, and has wider applicability.

Description

User portrait establishment method and device based on feature sequence comparison
Technical Field
The invention relates to the technical field of communication, in particular to a user portrait establishing method and device based on feature sequence comparison.
Background
The user representation is a virtual representation of a real user, which is a target user model built on top of a series of real data. The user target can be clearly revealed through the user portrait, so that accurate recommendation and marketing can be performed on the user, and omnibearing service is provided.
In the prior art, a method for establishing a user portrait analyzes the behavior of a user by acquiring information such as an internet log of the user, so as to obtain a characteristic information set of the user, classifies the characteristic information set, calculates the matching degree of the user, and then updates the existing user portrait; or supplementing the missing user portrait by using a naive Bayesian algorithm, updating the existing user portrait, and improving the accuracy of the user portrait; or by defining a threshold value and a maturity model, the change of the user behavior in the effective time is considered, so that the user portrait is quickly updated.
For partially structured data (such as an operator's user ticket), an object may be determined by a user account number, such as a user ID or a phone number, and the corresponding user representation may be further depicted based on his internet log or ticket. However, many times, a user is not able to uniquely sign the user on the same client or the same number because of the possibility of sharing the terminal or changing numbers. The internet surfing behavior on many terminals cannot accurately and uniquely identify the user through data (cookie) or information such as IP stored on the local terminal of the user, and the user often changes the internet surfing position, so that it is difficult to form a complete user portrait for the user.
Disclosure of Invention
The present invention has been made in view of the above problems, and it is an object of the present invention to provide a user portrayal creation method and apparatus based on feature sequence alignment that overcomes or at least partially solves the above problems.
According to one aspect of the invention, there is provided a user portrait creation method based on feature sequence alignment, comprising the steps of:
collecting feature sets corresponding to different users in different time periods, and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence with the associated structural feature coding sequence to obtain a structural feature combined sequence;
Determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
According to another aspect of the present invention, there is provided a user portrait creation apparatus based on feature sequence alignment, including:
the feature set collection module is used for collecting feature sets corresponding to different users in different time periods and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
the coding module is used for respectively coding the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
the comparison module is used for comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence with the associated structural feature coding sequence to obtain a structural feature combined sequence;
The matching module is used for determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
and the user portrait construction and updating module is used for constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
According to yet another aspect of the present invention, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the user portrait establishing method based on the feature sequence comparison.
According to still another aspect of the present invention, there is provided a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the user portrait creation method based on feature sequence alignment as described above.
According to the user portrait establishing method and device based on feature sequence comparison, a plurality of structural feature sequences and a plurality of time sequence feature sequences are determined by collecting feature sets corresponding to different users in different time periods; respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences; comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence; determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences; and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence. The invention distinguishes individuals based on characteristic sequence comparison, simultaneously classifies the characteristic sequences into structural characteristic sequences and time sequence characteristic sequences for respective comparison, improves the accuracy of comparison, can accurately and uniquely identify users, establishes an iterative algorithm for constructing and updating user portraits, ensures the timely updating of the user portraits, improves the accuracy of constructing and updating the user portraits, overcomes the defect that the prior art possibly has the defect that complete user portraits cannot be formed when the user portraits are constructed due to user number changing or sharing of terminals, and has wider applicability.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a flowchart of a user portrait creation method based on feature sequence alignment according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a user portrait creation apparatus based on feature sequence alignment according to an embodiment of the present invention;
FIG. 3 illustrates a schematic diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
A user can generate a series of behaviors such as calling, sending short messages, sending micro messages, microblog browsing, panning shopping and the like at different times through a communication network, data based on different attributes and different dimensions of the user can be obtained from operators, internet companies and web crawlers, and feature sets corresponding to different users in different time periods can be formed by the behaviors.
In the invention, according to the related principle of biological DNA feature sequence comparison, user feature attributes and internet surfing behavior are taken as the basis to establish a user feature set, a feature sequence corresponding to the feature set is formed by utilizing a coding technology, specifically, the DNA feature sequence is a DNA fragment with genetic effect, the basic structure and performance of life are supported, all information of the processes of race, blood group, inoculation, growth, apoptosis and the like of the life are stored, and the feature sequence of each individual is represented by the following formula:
B i ={(b ij )|(b′ ij )};
wherein B is i Representing the characteristic sequence of the ith individual, b i =(b ij ),b i Representation B i Is j epsilon m, b' i =(b′ ij ),j∈n,b′ i Representation B i Is a time series characteristic sequence of the (c). The structural feature sequence represents the basic information of the individual, controls the comparison and recombination of the individual features, and can be specifically the attribute information of the user, such as the attribute information of user account numbers, ages, sexes, IP addresses, interests and hobbies and the like; the time sequence characteristic sequence is the internet surfing behavior information of the user along with time, and the time sequence and the behavior pattern form a sequence. At the same time, a feature library of the time sequence feature sequence and the structural feature sequence is defined, wherein the feature library is a sequence pair of the time sequence feature sequence and the structural feature sequence which occur simultaneously, and the influence factor represents the time sequence feature sequence b' i Structural feature sequence b i The influence factor is defined by using an association rule algorithm, and the formula is as follows:
P=(p jj ) m×n
wherein p is jj Representing each timing characteristicFactor of sequence and factor of structural feature sequence influence coefficient, 0.ltoreq.p jj And less than or equal to 1, the closer to 1 the greater the influence of the time series feature sequence on the structural feature sequence. The influence factor matrix P may be a sparse matrix, most of which should be 0, and the main function is to be used in user portrait construction and/or update for the later embodiments.
Example 1
FIG. 1 shows a flow chart of an embodiment of the user portrait creation method based on feature sequence alignment of the present invention, as shown in FIG. 1, the method includes the steps of:
s101: and collecting feature sets corresponding to different users in different time periods, and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences.
Step S101 further includes: for each feature set, carrying out coding processing on the feature set to obtain a feature sequence corresponding to the feature set; the feature sequence comprises a plurality of feature items and feature values corresponding to the feature items; and screening a plurality of characteristic items in the characteristic sequence according to a preset screening rule to obtain a structural characteristic sequence and a time sequence characteristic sequence.
Specifically, feature sets of different users in different time periods are collected, feature sequences corresponding to the feature sets are formed through coding rules, and structural feature sequences and time sequence feature sequences are screened out according to different attributes of the features and preset screening rules. The preset screening rules comprise: attribute vocabularies with personal tag properties such as names, sexes, telephones, interests and hobbies and the like in the feature sequence are classified as structural feature sequences; descriptive attribute words with feature items in the feature sequence being non-personal labels such as entity nouns, action behaviors, adjectives and the like are classified as time sequence feature sequences.
For example, one feature set is:
B1={name(“zhangsan”),sex(”male”),telnumber(”12345678”),hobby(”basketb all”),url(”www.hupu.com”),action(”20160906 08:00:12search nba ticket”)}
wherein name, sex, telnumber, hobby belong to the structural feature sequence, url (web address) and action belong to the time sequence feature sequence.
S102: and respectively carrying out coding processing on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences.
In this step, hash encoding is performed on the plurality of structural feature sequences and the plurality of time-series feature sequences classified in step S101, respectively, to form a binary value sequence.
It should be noted that, for the structural feature sequence with longer word segmentation or the sequential feature sequence with continuous behavior, the combination and dimension reduction after encoding are also needed, and the implementation can be realized through simhash algorithm and other algorithms. For texts, a similar hash value can be obtained by using a simhash algorithm according to the similarity of the texts, so that hamming distance comparison is facilitated.
For example, for the action field of the timing sequence in step S101, the word is 20160906, 08:00:12, search, nba and ticket, and the merging and dimension reduction are performed according to the simhash algorithm, so that only one code value (for example, the code is 10110010111000) can be obtained for similarity comparison (for example, the hamming distance is compared).
S103: and comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence.
In an alternative manner, step S103 further includes:
comparing the characteristic values corresponding to the characteristic items in the structural characteristic coding sequence with the characteristic values corresponding to the same characteristic items in each other structural characteristic coding sequence to obtain a first comparison result of the structural characteristic coding sequence and each other structural characteristic coding sequence;
and searching a first comparison result which accords with the first comparison condition, and determining other structural feature coding sequences corresponding to the first comparison result which accords with the first comparison condition as associated structural feature coding sequences which have an association relationship with the structural feature coding sequences.
Specifically, different weight values can be set in advance for the comparison of the structural feature coding sequences and the comparison of the time sequence feature coding sequences, the weight ranges of the first weight value and the second weight value can be 0-1, the first weight value and the second weight value represent the importance degree of the structural feature coding sequences and the time sequence feature coding sequences for determining the identity attribute of one user, the information quantity of the different feature items in the structural feature coding sequences for representing one user is different, and the information quantity of the different feature items in the time sequence feature coding sequences for accessing different websites for representing one user is also different at different times. Then the first comparison result can be weighted according to the first weight value to obtain a first weighted operation result; judging whether the first weighted operation result is smaller than a first comparison threshold value or not; if yes, the comparison difference between the structural feature coding sequence and the other structural feature coding sequences is not obvious, and the structural feature coding sequences belong to the same user, a first comparison result is determined to be in accordance with a first comparison condition, the other structural feature coding sequences corresponding to the first comparison result are determined to be associated structural feature coding sequences, and then the structural feature coding sequences and the associated structural feature coding sequences are combined, namely, recombined; if not, the comparison difference between the structural feature coding sequence and the other structural feature coding sequences is obvious, and the structural feature coding sequences do not belong to the same user, and the first comparison result is determined to be not in accordance with the first comparison condition, and the comparison of the structural feature coding sequences is continuously carried out.
S104: and determining the time sequence feature coding sequence corresponding to the structural feature merging sequence according to the time sequence feature coding sequences.
In an alternative manner, step S104 further includes: searching a time sequence feature coding sequence containing feature values corresponding to the feature items specified in the structural feature merging sequence from a plurality of time sequence feature coding sequences; and determining the time sequence feature coding sequence corresponding to the structural feature merging sequence according to the searched time sequence feature coding sequence.
In an optional manner, according to the found time sequence feature coding sequence, determining the time sequence feature coding sequence corresponding to the structural feature merging sequence further includes: the time sequence feature coding sequence with the earliest time in the searched time sequence feature coding sequences is used as an original time sequence feature coding sequence corresponding to the structural feature merging sequence, and the original time sequence feature coding sequence is compared with other searched time sequence feature coding sequences to obtain a second comparison result; and determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the second comparison result.
Specifically, according to the second comparison result, determining the time sequence feature coding sequence corresponding to the structural feature merging sequence further includes: according to the second weight value, carrying out weighting operation on the second comparison result to obtain a second weighting operation result; judging whether the second weighted operation result is smaller than a second comparison threshold value or not; if yes, determining the original time sequence feature coding sequence as a time sequence feature coding sequence corresponding to the structural feature merging sequence; if not, updating the original time sequence feature code sequence according to other time sequence feature code sequences corresponding to the second weighting operation result to obtain the time sequence feature code sequence corresponding to the structural feature merging sequence.
Specifically, since the information amounts of the different structural feature code sequences representing a person are different, the first weight value represents the importance of the two different structural feature code sequences determining an identity attribute of the person; likewise, different time series characteristics access different websites representing a person at different times are also different, and therefore the second weight value represents how important two different time series characteristic code sequences determine to be a person identity attribute. Setting a first weight value W for comparison of structural feature code sequences and a second weight value W' for comparison of time sequence feature code sequences, wherein w= (W j ) W is more than or equal to 0 and less than or equal to 1, and W' is the same as the above.
And then respectively setting a first comparison threshold K of the structural feature coding sequence and a second comparison threshold K' of the time sequence feature coding sequence, wherein the first comparison threshold and the second comparison threshold are used as standards for judging the same person, and the first comparison threshold and the second comparison threshold can find a value with higher accuracy through continuous iteration and verification and are not repeated. The first comparison threshold and the second comparison threshold are generally (0, 3), and for example, the hamming distance after weighting the plurality of structural feature code sequences and the plurality of time-series feature code sequences encoded in step S102 is less than 3, which is regarded as insignificant difference.
In step S103, multiplying the first comparison result by a first weight value W to obtain a first weighted operation result, if the first weighted operation result is smaller than a first comparison threshold value K, indicating that the user corresponding to the structural feature code sequence is the same person as the user corresponding to other structural feature code sequences corresponding to the first comparison result, determining that the other structural feature code sequences are associated structural feature code sequences having an association relationship with the structural feature code sequence, and performing merging processing on the structural feature code sequences and the associated structural feature code sequences to obtain a structural feature merging sequence; if the first weighted operation result is greater than or equal to the first comparison threshold K, the comparison of the structural feature coding sequences is continued.
Step S103 is carried out to obtain a structural feature merging sequence, namely the time sequence feature coding sequence with the earliest time in the searched time sequence feature coding sequences is used as an original time sequence feature coding sequence corresponding to the structural feature merging sequence, and the original time sequence feature coding sequence is compared with other searched time sequence feature coding sequences to obtain a second comparison result; according to the second weight value W', carrying out weighting operation on the second comparison result to obtain a second weighting operation result; judging whether the second weighted operation result is smaller than a second comparison threshold K'; if so, the original time sequence feature coding sequence is determined to be the time sequence feature coding sequence corresponding to the structural feature merging sequence without updating the original time sequence feature coding sequence; if the second weighted operation result is greater than or equal to the second comparison threshold K', it represents that the original time sequence feature coding sequence has significant differences from other time sequence feature coding sequences, and the original time sequence feature coding sequence needs to be updated according to other time sequence feature coding sequences corresponding to the second weighted operation result, so as to obtain a time sequence feature coding sequence corresponding to the structural feature merging sequence.
S105: and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
In the step, for the user without user portrait, constructing the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence; for the user with the user portrait, the user portrait is updated according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
In an optional manner, updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence further comprises: calculating an influence factor of a time sequence feature coding sequence corresponding to the structural feature merging sequence; and processing the structural feature merging sequence according to the influence factors, and updating the user portrait.
And calculating a plurality of influence factors of the time sequence feature coding sequence corresponding to the structure feature merging sequence, performing accumulation operation on the influence factors to obtain an influence factor calculation result, and checking whether the influence factor calculation result has significant change or not through variance. If there is no significant change, no update to the user representation is required. If the significance changes, the corresponding user portrait is required to be updated, the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence are acquired, and the existing structural feature merging sequence is updated according to the time sequence feature coding sequence and the item set of the feature item with the highest confidence and support degree of the structural feature merging sequence. And updating the user portrait by using the updated structural feature merging sequence.
After the updated user representation is obtained, the steps S101-S105 can be repeated continuously to perform feature sequence comparison. By the method, the complete user portrait can be gradually formed, and meanwhile, the dynamic real-time updating of the user portrait can be realized.
By adopting the method provided by the embodiment, a plurality of structural feature sequences and a plurality of time sequence feature sequences are determined by collecting feature sets corresponding to different users in different time periods; respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences; comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence; determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences; and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence. The invention distinguishes individuals based on characteristic sequence comparison, simultaneously classifies the characteristic sequences into structural characteristic sequences and time sequence characteristic sequences for respective comparison, improves the accuracy of comparison, can accurately and uniquely identify users, establishes an iterative algorithm for constructing and updating user portraits, ensures the timely updating of the user portraits, improves the accuracy of constructing and updating the user portraits, overcomes the defect that the prior art possibly has the defect that complete user portraits cannot be formed when the user portraits are constructed due to user number changing or sharing of terminals, and has wider applicability.
Example two
FIG. 2 is a schematic diagram of a user portrait creation apparatus according to an embodiment of the present invention based on feature sequence alignment. As shown in fig. 2, the apparatus includes: a feature set collection module 210, an encoding module 220, a comparison module 230, a matching module 240, and a user representation construction and update module 250.
The feature set collection module 210 is configured to collect feature sets corresponding to different users in different time periods, and determine a plurality of structural feature sequences and a plurality of time sequence feature sequences.
In an alternative manner, the feature set collection module 210 is further configured to: for each feature set, carrying out coding processing on the feature set to obtain a feature sequence corresponding to the feature set; the feature sequence comprises a plurality of feature items and feature values corresponding to the feature items; and screening a plurality of characteristic items in the characteristic sequence according to a preset screening rule to obtain a structural characteristic sequence and a time sequence characteristic sequence.
The encoding module 220 is configured to encode the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature encoding sequences and a plurality of time sequence feature encoding sequences.
The comparison module 230 is configured to compare the structural feature coding sequence with other structural feature coding sequences except for the structural feature coding sequence in the plurality of structural feature coding sequences according to any structural feature coding sequence in the plurality of structural feature coding sequences, determine an associated structural feature coding sequence having an association relationship with the structural feature coding sequence, and combine the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence.
In an alternative way, the comparison module 230 is further configured to: comparing the characteristic values corresponding to the characteristic items in the structural characteristic coding sequence with the characteristic values corresponding to the same characteristic items in each other structural characteristic coding sequence to obtain a first comparison result of the structural characteristic coding sequence and each other structural characteristic coding sequence; and searching a first comparison result which accords with the first comparison condition, and determining other structural feature coding sequences corresponding to the first comparison result which accords with the first comparison condition as associated structural feature coding sequences which have an association relationship with the structural feature coding sequences.
Specifically, the comparison module 230 is further configured to: according to the first weight value, carrying out weighted operation on the first comparison result to obtain a first weighted operation result; judging whether the first weighted operation result is smaller than a first comparison threshold value or not; if yes, determining that the first comparison result meets a first comparison condition, and determining other structural feature coding sequences corresponding to the first comparison result as associated structural feature coding sequences; if not, determining that the first comparison result does not meet the first comparison condition.
The matching module 240 is configured to determine a time sequence feature code sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature code sequences.
In an alternative manner, the matching module 240 is further configured to: searching a time sequence feature coding sequence containing feature values corresponding to the feature items specified in the structural feature merging sequence from a plurality of time sequence feature coding sequences; and determining the time sequence feature coding sequence corresponding to the structural feature merging sequence according to the searched time sequence feature coding sequence.
In an alternative manner, the matching module 240 is further configured to: the time sequence feature coding sequence with the earliest time in the searched time sequence feature coding sequences is used as an original time sequence feature coding sequence corresponding to the structural feature merging sequence, and the original time sequence feature coding sequence is compared with other searched time sequence feature coding sequences to obtain a second comparison result; and determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the second comparison result.
Specifically, the matching module 240 is further configured to: according to the second weight value, carrying out weighting operation on the second comparison result to obtain a second weighting operation result; judging whether the second weighted operation result is smaller than a second comparison threshold value or not; if yes, determining the original time sequence feature coding sequence as a time sequence feature coding sequence corresponding to the structural feature merging sequence; if not, updating the original time sequence feature code sequence according to other time sequence feature code sequences corresponding to the second weighting operation result to obtain the time sequence feature code sequence corresponding to the structural feature merging sequence.
The user portrait construction and updating module 250 is configured to construct and/or update the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
In an alternative approach, the user portrayal construction and update module 250 is further configured to: calculating an influence factor of a time sequence feature coding sequence corresponding to the structural feature merging sequence; and processing the structural feature merging sequence according to the influence factors, and updating the user portrait.
By adopting the device provided by the embodiment, a plurality of structural feature sequences and a plurality of time sequence feature sequences are determined by collecting feature sets corresponding to different users in different time periods; respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences; comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence; determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences; and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence. The invention distinguishes individuals based on characteristic sequence comparison, simultaneously classifies the characteristic sequences into structural characteristic sequences and time sequence characteristic sequences for respective comparison, improves the accuracy of comparison, can accurately and uniquely identify users, establishes an iterative algorithm for constructing and updating user portraits, ensures the timely updating of the user portraits, improves the accuracy of constructing and updating the user portraits, overcomes the defect that the prior art possibly has the defect that complete user portraits cannot be formed when the user portraits are constructed due to user number changing or sharing of terminals, and has wider applicability.
Example III
The embodiment of the invention provides a non-volatile computer storage medium, which stores at least one executable instruction, and the computer executable instruction can execute the user portrait establishment method based on feature sequence comparison in any of the method embodiments.
The executable instructions may be particularly useful for causing a processor to:
collecting feature sets corresponding to different users in different time periods, and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence;
Determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
and constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
Example IV
FIG. 3 illustrates a schematic diagram of an embodiment of a computing device of the present invention, and the embodiments of the present invention are not limited to a particular implementation of the computing device.
As shown in fig. 3, the computing device may include:
a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus.
Wherein: the processor, communication interface, and memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers, etc. And the processor is used for executing a program, and can specifically execute relevant steps in the user portrait establishment method embodiment based on the feature sequence comparison.
In particular, the program may include program code including computer-operating instructions.
The processor may be a central processing unit, CPU, or specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the server may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory or may further comprise non-volatile memory, such as at least one disk memory.
The program may be specifically operative to cause the processor to:
collecting feature sets corresponding to different users in different time periods, and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence;
determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
And constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims (8)

1. A user portrait establishing method based on feature sequence comparison is characterized by comprising the following steps:
collecting feature sets corresponding to different users in different time periods, and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
respectively carrying out coding treatment on the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence;
determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence;
The step of collecting feature sets corresponding to different users in different time periods, and the step of determining a plurality of structural feature sequences and a plurality of time sequence feature sequences further comprises the steps of: for each feature set, carrying out coding processing on the feature set to obtain a feature sequence corresponding to the feature set; the feature sequence comprises a plurality of feature items and feature values corresponding to the feature items; screening a plurality of characteristic items in the characteristic sequence according to a preset screening rule to obtain a structural characteristic sequence and a time sequence characteristic sequence;
wherein, according to the plurality of time sequence feature coding sequences, determining the time sequence feature coding sequence corresponding to the structure feature merging sequence further comprises: searching a time sequence feature coding sequence containing feature values corresponding to the feature items specified in the structural feature merging sequence from a plurality of time sequence feature coding sequences; and determining the time sequence feature coding sequence corresponding to the structural feature merging sequence according to the searched time sequence feature coding sequence.
2. The method of claim 1, wherein comparing the structural feature code sequence to other structural feature code sequences of the plurality of structural feature code sequences other than the structural feature code sequence, and determining an associated structural feature code sequence having an association with the structural feature code sequence further comprises:
Comparing the characteristic values corresponding to the characteristic items in the structural characteristic coding sequence with the characteristic values corresponding to the same characteristic items in each other structural characteristic coding sequence to obtain a first comparison result of the structural characteristic coding sequence and each other structural characteristic coding sequence;
and searching a first comparison result which accords with the first comparison condition, and determining other structural feature coding sequences corresponding to the first comparison result which accords with the first comparison condition as associated structural feature coding sequences which have an association relationship with the structural feature coding sequences.
3. The method according to claim 2, wherein the searching for the first comparison result satisfying the first comparison condition, and determining the other structural feature code sequences corresponding to the first comparison result satisfying the first comparison condition as associated structural feature code sequences having an association relationship with the structural feature code sequences further includes:
according to the first weight value, carrying out weighted operation on the first comparison result to obtain a first weighted operation result;
judging whether the first weighted operation result is smaller than a first comparison threshold value or not;
if yes, determining that the first comparison result meets the first comparison condition, and determining other structural feature coding sequences corresponding to the first comparison result as associated structural feature coding sequences;
If not, determining that the first comparison result does not accord with the first comparison condition.
4. The method of claim 1, wherein determining a temporal feature coding sequence corresponding to the structural feature merge sequence based on the found temporal feature coding sequence further comprises:
the time sequence feature coding sequence with the earliest time in the searched time sequence feature coding sequences is used as an original time sequence feature coding sequence corresponding to the structural feature merging sequence, and the original time sequence feature coding sequence is compared with other searched time sequence feature coding sequences to obtain a second comparison result;
and determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the second comparison result.
5. The method of any one of claims 1-4, wherein updating the user representation based on the structural feature merge sequence and the temporal feature coding sequence corresponding to the structural feature merge sequence further comprises:
calculating an influence factor of a time sequence feature coding sequence corresponding to the structural feature merging sequence;
and processing the structural feature merging sequence according to the influence factors, and updating the user portrait.
6. A user portrait establishing device based on feature sequence comparison is characterized by comprising:
The feature set collection module is used for collecting feature sets corresponding to different users in different time periods and determining a plurality of structural feature sequences and a plurality of time sequence feature sequences;
the coding module is used for respectively coding the plurality of structural feature sequences and the plurality of time sequence feature sequences to obtain a plurality of structural feature coding sequences and a plurality of time sequence feature coding sequences;
the comparison module is used for comparing the structural feature coding sequence with other structural feature coding sequences except the structural feature coding sequence in the structural feature coding sequences aiming at any structural feature coding sequence in the structural feature coding sequences, determining an associated structural feature coding sequence with an association relation with the structural feature coding sequence, and combining the structural feature coding sequence and the associated structural feature coding sequence to obtain a structural feature combined sequence;
the matching module is used for determining a time sequence feature coding sequence corresponding to the structural feature merging sequence according to the plurality of time sequence feature coding sequences;
the user portrait construction and updating module is used for constructing and/or updating the user portrait according to the structural feature merging sequence and the time sequence feature coding sequence corresponding to the structural feature merging sequence;
Wherein the feature set collection module is further to: for each feature set, carrying out coding processing on the feature set to obtain a feature sequence corresponding to the feature set; the feature sequence comprises a plurality of feature items and feature values corresponding to the feature items; screening a plurality of characteristic items in the characteristic sequence according to a preset screening rule to obtain a structural characteristic sequence and a time sequence characteristic sequence;
the matching module is further configured to: searching a time sequence feature coding sequence containing feature values corresponding to the feature items specified in the structural feature merging sequence from a plurality of time sequence feature coding sequences; and determining the time sequence feature coding sequence corresponding to the structural feature merging sequence according to the searched time sequence feature coding sequence.
7. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to a user representation creation method based on feature sequence alignment as claimed in any one of claims 1 to 5.
8. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to a method for creating a user representation based on a feature sequence alignment as claimed in any one of claims 1 to 5.
CN202010237633.XA 2020-03-30 2020-03-30 User portrait establishment method and device based on feature sequence comparison Active CN113468389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010237633.XA CN113468389B (en) 2020-03-30 2020-03-30 User portrait establishment method and device based on feature sequence comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010237633.XA CN113468389B (en) 2020-03-30 2020-03-30 User portrait establishment method and device based on feature sequence comparison

Publications (2)

Publication Number Publication Date
CN113468389A CN113468389A (en) 2021-10-01
CN113468389B true CN113468389B (en) 2023-04-28

Family

ID=77864925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010237633.XA Active CN113468389B (en) 2020-03-30 2020-03-30 User portrait establishment method and device based on feature sequence comparison

Country Status (1)

Country Link
CN (1) CN113468389B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN108519993A (en) * 2018-03-02 2018-09-11 华南理工大学 The social networks focus incident detection method calculated based on multiple data stream
CN109684543A (en) * 2018-12-14 2019-04-26 北京百度网讯科技有限公司 User's behavior prediction and information distribution method, device, server and storage medium
CN109816410A (en) * 2017-11-21 2019-05-28 北京奇虎科技有限公司 The analysis method and device of advertisement major product audience
CN109992982A (en) * 2019-04-11 2019-07-09 北京信息科技大学 Big data access authorization methods, device and big data platform
CN110060167A (en) * 2019-03-12 2019-07-26 中国平安财产保险股份有限公司 A kind of insurance products recommended method, server and computer-readable medium
CN110727860A (en) * 2019-09-16 2020-01-24 武汉安诠加信息技术有限公司 User portrait method, device, equipment and medium based on internet beauty platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504099A (en) * 2015-09-07 2017-03-15 国家计算机网络与信息安全管理中心 A kind of system for building user's portrait
CN109816410A (en) * 2017-11-21 2019-05-28 北京奇虎科技有限公司 The analysis method and device of advertisement major product audience
CN108519993A (en) * 2018-03-02 2018-09-11 华南理工大学 The social networks focus incident detection method calculated based on multiple data stream
CN109684543A (en) * 2018-12-14 2019-04-26 北京百度网讯科技有限公司 User's behavior prediction and information distribution method, device, server and storage medium
CN110060167A (en) * 2019-03-12 2019-07-26 中国平安财产保险股份有限公司 A kind of insurance products recommended method, server and computer-readable medium
CN109992982A (en) * 2019-04-11 2019-07-09 北京信息科技大学 Big data access authorization methods, device and big data platform
CN110727860A (en) * 2019-09-16 2020-01-24 武汉安诠加信息技术有限公司 User portrait method, device, equipment and medium based on internet beauty platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于移动互联网行为分析的用户画像***设计;王冬羽;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20180315;全文 *
基于移动终端日志的微信老年用户使用行为画像研究;李嘉兴,王晰巍,常颖,张长亮;《图书情报工作》;20191205;第63卷(第22期);全文 *

Also Published As

Publication number Publication date
CN113468389A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
Serafino et al. True scale-free networks hidden by finite size effects
CN107463704B (en) Search method and device based on artificial intelligence
US10546006B2 (en) Method and system for hybrid information query
CN117555928A (en) Data processing system and method based on device use associated internet device
US9064212B2 (en) Automatic event categorization for event ticket network systems
US9256692B2 (en) Clickstreams and website classification
KR20180099812A (en) Identifying entities using the deep learning model
JP2019532445A (en) Similarity search using ambiguous codes
US20130159233A1 (en) Systems and methods for relevance scoring of a digital resource
CN110223186B (en) User similarity determining method and information recommending method
CN112989169B (en) Target object identification method, information recommendation method, device, equipment and medium
CN111008335B (en) Information processing method, device, equipment and storage medium
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
CN111259952A (en) Abnormal user identification method and device, computer equipment and storage medium
CN107368499B (en) Client label modeling and recommending method and device
CN114223012A (en) Push object determination method and device, terminal equipment and storage medium
WO2024041483A1 (en) Recommendation method and related device
CN113343091A (en) Industrial and enterprise oriented science and technology service recommendation calculation method, medium and program
WO2023029350A1 (en) Click behavior prediction-based information pushing method and apparatus
CN113656699B (en) User feature vector determining method, related equipment and medium
CN110929172A (en) Information selection method and device, electronic equipment and readable storage medium
CN112800286B (en) User relationship chain construction method and device and electronic equipment
CN114428910A (en) Resource recommendation method and device, electronic equipment, product and medium
CN118043802A (en) Recommendation model training method and device
CN116823410A (en) Data processing method, object processing method, recommending method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant