CN111935259B

CN111935259B - Method and device for determining target account set, storage medium and electronic equipment

Info

Publication number: CN111935259B
Application number: CN202010753280.9A
Authority: CN
Inventors: 杨海力; 王伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2021-11-23
Anticipated expiration: 2040-07-30
Also published as: CN111935259A

Abstract

The invention discloses a method and a device for determining a target account set, a storage medium and electronic equipment. Wherein, the method comprises the following steps: determining a first group of characteristic vectors of a first group of seed accounts and a second group of characteristic vectors of a group of candidate accounts according to the image characteristic data and the behavior characteristic data of the accounts, determining the distance between the group of candidate accounts and the first group of seed accounts according to the first group of characteristic vectors and the second group of characteristic vectors, further determining the second group of seed accounts in the group of candidate accounts, and determining a target account set as comprising the first group of seed accounts and the second group of seed accounts. The invention solves the technical problem that the account set with larger transmission influence is difficult to really, quickly and effectively determine in the related technology.

Description

Method and device for determining target account set, storage medium and electronic equipment

Technical Field

The invention relates to the field of computers, in particular to a method and a device for determining a target account set, a storage medium and electronic equipment.

Background

In the related art, most of the seed account sets with large propagation influence are determined by using a graph sampling technology for a full account set and obtaining the seed accounts through a greedy algorithm, but the current technical scheme for determining the seed accounts is only suitable for small data volume and a small number of seed account sets and cannot be suitable for large-scale account sets.

In addition, in the related art at present, it is necessary to assume that all accounts in the seed set are in an activated state at an initial time, and the application conditions are harsh, so that it is difficult to really, quickly and effectively determine the set of target accounts having a large propagation influence.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a target account set, a storage medium and electronic equipment, which are used for at least solving the technical problem that the account set with larger transmission influence is difficult to be determined really, quickly and effectively in the related technology.

According to an aspect of the embodiments of the present invention, there is provided a method for determining a target account set, including:

acquiring a first group of characterization vectors of a first group of seed accounts and a second group of characterization vectors of a group of candidate accounts, wherein the characterization vectors in the first group of characterization vectors correspond to the seed accounts in the first group of seed accounts one by one, the characterization vectors in the second group of characterization vectors correspond to the candidate accounts in the group of candidate accounts one by one, the characterization vectors in the first group of characterization vectors are determined according to a first group of portrait feature data and a first group of behavior feature data of the first group of seed accounts, and the characterization vectors in the second group of characterization vectors are determined according to a second group of portrait feature data and a second group of behavior feature data of the group of candidate accounts; determining a distance between the set of candidate account numbers and the first set of seed account numbers according to the first set of characterization vectors and the second set of characterization vectors; and determining a second group of seed account numbers in the group of candidate account numbers according to the distance between the group of candidate account numbers and the first group of seed account numbers, and determining the target account number set as comprising the first group of seed account numbers and the second group of seed account numbers.

Optionally, the obtaining a first set of characterization vectors of a first set of seed account numbers includes: acquiring a first group of characteristic data corresponding to the first group of seed account numbers, wherein the first group of characteristic data comprises a first group of portrait characteristic data and a first group of behavior characteristic data of the first group of seed account numbers; and inputting the first group of feature data into a target neural network model to obtain the first group of characterization vectors.

Optionally, the inputting the first set of sample feature data into a target neural network model to obtain the first set of characterization vectors includes: randomly initializing each feature data in the first group of portrait feature data and the first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors includes a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of the first group of seed account numbers, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed account numbers, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed account numbers, and the third class feature data is preconfigured feature data; carrying out full-connection conversion on the first class vector and the third class vector to generate a second group of vectors; respectively performing first target processing and second target processing on the second-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the second target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; determining the first set of characterization vectors from the second and third sets of vectors.

Optionally, said determining said first set of characterization vectors from said second set of vectors and said third set of vectors comprises: stitching the second set of vectors and the third set of vectors into a first target set of vectors; performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism; inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting a second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises a feature intersection model which is used for obtaining the intersection feature information; and splicing the third group of target vectors and the first cross feature information into the first group of characterization vectors.

Optionally, the obtaining a second set of token vectors of a set of candidate accounts includes: acquiring a second set of feature data corresponding to the set of candidate accounts, wherein the second set of feature data comprises a second set of portrait feature data and a second set of behavior feature data of the set of candidate accounts; and inputting the second group of feature data into a target neural network model to obtain a second group of characterization vectors.

Optionally, the inputting the second set of sample feature data into a target neural network model to obtain the second set of characterization vectors includes: randomly initializing each feature data in the second group of portrait feature data and the second group of behavior feature data to generate a fourth group of vectors, where the fourth group of vectors includes a fourth class vector, a fifth class vector and a sixth class vector, the fourth class vector is used to represent the fourth class feature data of the group of candidate accounts, the fourth class feature data is feature data represented by using one identifier, the fifth class vector is used to represent the fifth class feature data of the group of candidate accounts, the fifth class feature data is feature data represented by combining a plurality of identifiers, the sixth class vector is used to represent the sixth class feature data of the group of candidate accounts, and the sixth class feature data is preconfigured feature data; carrying out full-connection transformation on the fourth-class vector and the sixth-class vector to generate a fifth group of vectors; respectively performing fourth target processing and fifth target processing on the fifth-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a sixth group of vectors, wherein the fourth target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the fifth target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; determining the second set of characterization vectors from the fifth and sixth sets of vectors.

Optionally, said determining said second set of characterization vectors from said fifth set of vectors and said sixth set of vectors comprises: stitching the fifth set of vectors and the sixth set of vectors into a fourth set of target vectors; performing sixth target processing on the fourth target vector group to obtain a fifth target vector group, wherein the sixth target processing is used for processing the fourth target vector group based on an attention mechanism; inputting the fifth target vector group into a preset multilayer perceptron to obtain a sixth group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting a fifth target vector group into a preset feature intersection model to obtain second intersection feature information, wherein the target neural network model comprises a feature intersection model, and the feature intersection model is used for acquiring the intersection feature information; and splicing the sixth group of target vectors and the second cross feature information into the second group of characterization vectors.

Optionally, the determining a distance between the set of candidate accounts and the first set of seed accounts according to the first set of token vectors and the second set of token vectors includes: and obtaining a cosine distance between each token vector in the second group of token vectors and each token vector in the first group of token vectors to obtain a group of cosine distances corresponding to each candidate account, wherein the cosine distance between the second token vector in the second group of token vectors and the first token vector in the first group of token vectors is used for representing the distance between a second candidate account and a first seed account, the second candidate account is a candidate account corresponding to the second token vector in the group of candidate accounts, and the first seed account is a seed account corresponding to the first token vector in the first group of seed accounts.

Optionally, the determining a second set of seed accounts in the set of candidate accounts according to the distance between the set of candidate accounts and the first set of seed accounts includes: acquiring an average value of a group of cosine distances corresponding to each candidate account; determining candidate accounts in the set of candidate accounts for which the average value is greater than a predetermined threshold as the second set of seed accounts; or determining the first N candidate accounts in the group of candidate accounts after being sorted according to the average value as the second group of seed accounts, wherein N is a natural number.

Optionally, after determining the target account number set as including the first set of seed account numbers and the second set of seed account numbers, the method further includes: and sending the target media resources to the accounts in the target account set.

Optionally, before the obtaining the first set of characterization vectors for the first set of seed account numbers and the second set of characterization vectors for the set of candidate account numbers, the method further includes:

obtaining first data associated with a set of nodes in a target network, wherein the first data is used for indicating the frequency and the path of activating other nodes in the set of nodes by each node in the set of nodes;

and generating a target set based on the capability of each node in the first data for activating the other nodes, wherein the target set is a set formed by nodes corresponding to the first group of seed accounts in a target network, and the capability of each node in the first data for activating the other nodes is determined based on the frequency and the path.

Optionally, generating a target set based on the capability of each node in the first data to activate the other nodes includes:

generating a first directed graph based on the first data, wherein the first directed graph records each node in a group of nodes in the target network, a first group of activation paths of each node when other nodes in the group of nodes are activated, and activation probabilities corresponding to each activation path in the first group of activation paths;

sampling the first directed graph n times to generate n second directed graphs, wherein the second directed graphs record a first group of nodes in the group of nodes and a group of activation paths of the first group of nodes when other nodes in the first group of nodes are activated;

calculating n first sets corresponding to a first node based on the n second directed graphs, wherein the first sets comprise all nodes which can reach the first node through the activation path in the first group of nodes and the first node, and the first group of nodes comprises the first node;

merging the n first sets into a second set;

repeatedly performing the following operations until the second set is an empty set;

obtaining a second node with the highest frequency of occurrence in the second set, wherein the first group of nodes comprises the second node;

acquiring all the first sets containing the second nodes from the second set to generate a third set, wherein the third set consists of partial sets in the n first sets;

obtaining a second group of nodes with the highest occurrence frequency in the third set, wherein the second group of nodes is a group of nodes with the highest ranking obtained after ranking according to the occurrence frequency, and the number of the nodes of the second group of nodes is preset by a system;

adding the second set of nodes to a target set and deleting the third set from the second set.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for determining a target account set, including:

the acquisition module is used for acquiring a first group of characterization vectors of a first group of seed accounts and a second group of characterization vectors of a group of candidate accounts, wherein the characterization vectors in the first group of characterization vectors correspond to the seed accounts in the first group of seed accounts one by one, the characterization vectors in the second group of characterization vectors correspond to the candidate accounts in the group of candidate accounts one by one, the characterization vectors in the first group of characterization vectors are determined vectors according to a first group of portrait feature data and a first group of behavior feature data of the first group of seed accounts, and the characterization vectors in the second group of characterization vectors are determined vectors according to a second group of portrait feature data and a second group of behavior feature data of the group of candidate accounts;

a first determining module, configured to determine a distance between the set of candidate account numbers and the first set of seed account numbers according to the first set of token vectors and the second set of token vectors;

a second determining module, configured to determine a second group of seed account numbers in the group of candidate account numbers according to a distance between the group of candidate account numbers and the first group of seed account numbers, and determine a target account number set as including the first group of seed account numbers and the second group of seed account numbers.

Optionally, the obtaining module includes: a first acquiring unit, configured to acquire a first set of feature data corresponding to the first set of seed account numbers, where the first set of feature data includes a first set of portrait feature data and a first set of behavior feature data of the first set of seed account numbers; and the first processing unit is used for inputting the first group of characteristic data into a target neural network model to obtain the first group of characterization vectors.

Optionally, the first processing unit is configured to input the first set of sample feature data into a target neural network model to obtain the first set of characterization vectors by: randomly initializing each feature data in the first group of portrait feature data and the first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors includes a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of the first group of seed account numbers, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed account numbers, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed account numbers, and the third class feature data is preconfigured feature data; carrying out full-connection conversion on the first class vector and the third class vector to generate a second group of vectors; respectively performing first target processing and second target processing on the second-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the second target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; determining the first set of characterization vectors from the second and third sets of vectors.

Optionally, the first processing unit is configured to determine the first set of characterization vectors according to the second set of vectors and the third set of vectors by: stitching the second set of vectors and the third set of vectors into a first target set of vectors; performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism; inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting a second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises a feature intersection model which is used for obtaining the intersection feature information; and splicing the third group of target vectors and the first cross feature information into the first group of characterization vectors.

Optionally, the obtaining module includes: a second obtaining unit, configured to obtain a second set of feature data corresponding to the set of candidate account numbers, where the second set of feature data includes a second set of portrait feature data and a second set of behavior feature data of the set of candidate account numbers; and the second processing unit is used for inputting the second group of characteristic data into a target neural network model to obtain a second group of characterization vectors.

Optionally, the second processing unit is configured to input the second set of sample feature data into a target neural network model to obtain the second set of characterization vectors by: randomly initializing each feature data in the second group of portrait feature data and the second group of behavior feature data to generate a fourth group of vectors, where the fourth group of vectors includes a fourth class vector, a fifth class vector and a sixth class vector, the fourth class vector is used to represent the fourth class feature data of the group of candidate accounts, the fourth class feature data is feature data represented by using one identifier, the fifth class vector is used to represent the fifth class feature data of the group of candidate accounts, the fifth class feature data is feature data represented by combining a plurality of identifiers, the sixth class vector is used to represent the sixth class feature data of the group of candidate accounts, and the sixth class feature data is preconfigured feature data; carrying out full-connection transformation on the fourth-class vector and the sixth-class vector to generate a fifth group of vectors; respectively performing fourth target processing and fifth target processing on the fifth-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a sixth group of vectors, wherein the fourth target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the fifth target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; determining the second set of characterization vectors from the fifth and sixth sets of vectors.

Optionally, the second processing unit is configured to determine the second set of characterization vectors from the fifth set of vectors and the sixth set of vectors by: stitching the fifth set of vectors and the sixth set of vectors into a fourth set of target vectors; performing sixth target processing on the fourth target vector group to obtain a fifth target vector group, wherein the sixth target processing is used for processing the fourth target vector group based on an attention mechanism; inputting the fifth target vector group into a preset multilayer perceptron to obtain a sixth group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting a fifth target vector group into a preset feature intersection model to obtain second intersection feature information, wherein the target neural network model comprises a feature intersection model, and the feature intersection model is used for acquiring the intersection feature information; and splicing the sixth group of target vectors and the second cross feature information into the second group of characterization vectors.

Optionally, the first determining module includes: a third processing unit, configured to obtain a cosine distance between each token vector in the second set of token vectors and each token vector in the first set of token vectors, and obtain a set of cosine distances corresponding to each candidate account, where a cosine distance between a second token vector in the second set of token vectors and a first token vector in the first set of token vectors is used to represent a distance between a second candidate account and a first seed account, the second candidate account is a candidate account in the set of candidate accounts corresponding to the second token vector, and the first seed account is a seed account in the first set of seed accounts corresponding to the first token vector.

Optionally, the second determining module includes: a third obtaining unit, configured to obtain an average value of a set of cosine distances corresponding to each candidate account; a first determining unit, configured to determine, as the second set of seed accounts, candidate accounts in the set of candidate accounts for which the average value is greater than a predetermined threshold; or, the second determining unit is configured to determine, as the second group of seed accounts, the first N candidate accounts in the group of candidate accounts that are sorted according to the average value, where N is a natural number.

Optionally, after determining the target account number set as including the first set of seed account numbers and the second set of seed account numbers, the apparatus is further configured to: and sending the target media resources to the accounts in the target account set.

Optionally, the apparatus is further configured to:

obtaining first data associated with a set of nodes in a target network before the obtaining of the first set of characterization vectors of the first set of seed account numbers and the second set of characterization vectors of the set of candidate account numbers, wherein the first data is used for indicating the frequency and the path of activating other nodes in the set of nodes by each node in the set of nodes;

Optionally, the apparatus is further configured to generate a target set based on the capability of each node in the first data to activate the other nodes by:

merging the n first sets into a second set;

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the method for determining the target account set when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the method for determining the target account set through the computer program.

In the embodiment of the invention, a first group of characteristic vectors of a first group of seed accounts and a second group of characteristic vectors of a group of candidate accounts are determined according to the image characteristic data and the behavior characteristic data of the accounts, then the distance between the group of candidate accounts and the first group of seed accounts is determined according to the first group of characteristic vectors and the second group of characteristic vectors, and then the second group of seed accounts are determined in the group of candidate accounts, and a target account set is determined to comprise the first group of seed accounts and the second group of seed accounts The technical problem of effectively determining the account set with large propagation influence is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a diagram illustrating an application environment of an alternative method for determining a set of target accounts according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an alternative method for determining a target account set according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an alternative method for determining a target account set, according to an embodiment of the invention;

FIG. 4 is a flowchart illustrating an alternative method for determining a target account set according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating an alternative method for determining a target account set according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an alternative method for determining a target account set according to an embodiment of the present invention;

FIG. 7 is a flowchart illustrating an alternative method for determining a target account set according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an alternative apparatus for determining a target account set according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of an alternative target account set determination apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of an alternative target account set determination apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

im: influence maximization.

Knn: k-nearest neighbors (K nearest neighbor algorithm).

V: the collection of nodes in the figure.

E: the set of edges in the figure.

RR set: reverse reachable.

The invention is illustrated below with reference to examples:

according to an aspect of the embodiment of the present invention, a method for determining a target account set is provided, and optionally, in this embodiment, the method for determining a target account set may be applied to a hardware environment formed by a server 101 and a user terminal 103 as shown in fig. 1. As shown in fig. 1, a server 101 is connected to a terminal 103 through a network, and may be configured to provide a service to a user terminal or a client installed on the user terminal, where the target client may be a video client, an instant messaging client, a browser client, an education client, or the like. The database 105 may be provided on or separate from the server for providing data storage services for the server 101, and the network may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other wireless communication enabled networks, the user terminal 103 may be a target client configured terminal, and may include but is not limited to at least one of the following: the application 107 may be a single server, or a server cluster consisting of a plurality of servers, or a cloud server, where the application 107 is displayed through a user terminal 103, and may use the service for determining the target account set through an entry of the application 107 configured on the terminal to determine the target account set, or the application 107 is an application for logging in through an account in the target account set, which is merely an example and is not limited in this embodiment.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and the like.

Optionally, as an optional implementation manner, as shown in fig. 2, the method for determining the target account set includes:

s202, acquiring a first group of characterization vectors of a first group of seed accounts and a second group of characterization vectors of a group of candidate accounts, wherein the characterization vectors in the first group of characterization vectors correspond to the seed accounts in the first group of seed accounts one by one, the characterization vectors in the second group of characterization vectors correspond to the candidate accounts in the group of candidate accounts one by one, the characterization vectors in the first group of characterization vectors are determined vectors according to first group of portrait feature data and first group of behavior feature data of the first group of seed accounts, and the characterization vectors in the second group of characterization vectors are determined vectors according to second group of portrait feature data and second group of behavior feature data of the group of candidate accounts;

s204, determining the distance between a group of candidate accounts and the first group of seed accounts according to the first group of characterization vectors and the second group of characterization vectors;

s206, according to the distance between the group of candidate accounts and the first group of seed accounts, determining a second group of seed accounts in the group of candidate accounts, and determining the target account set as including the first group of seed accounts and the second group of seed accounts.

Optionally, in this embodiment, the first group of seed account numbers may include, but are not limited to, accounts in a preconfigured set of seed account numbers, and the spreading influence of the first group of seed account numbers is an account with a larger spreading influence among all accounts, in other words, the probability that the first group of seed account numbers activates other accounts is larger.

Optionally, in this embodiment, the dimension of the token vector may be adjusted according to actual needs, for example, the feature vector may be set as a 128-dimensional token vector, and further, under the condition that the computation amount is ensured to be low, the feature data of the first group of seed accounts or the feature data of the group of candidate accounts may be effectively represented.

Optionally, in this embodiment, the portrait characteristic data may include, but is not limited to, portrait characteristics such as gender, age, and region of interest of the user using the account, and the behavior characteristic data may include, but is not limited to, behavior characteristics such as the number of times the video is played in the account, the number of times the video is shared, an applet, and the number of times the link is linked.

Optionally, in this embodiment, the method for determining the target account set may be applied to, but not limited to, the following application scenarios:

for example, in the process of promoting partial service or media information in an instant messaging application, n users need to be selected from a user group using the instant messaging application in a target network, and the promotion of the service or media information or the user diversion of the application program is realized by means of red dot push including but not limited to "see one at a glance" and "see one at a glance" inserting a video, and then performing touch operation on the inserted video, and further realizing a mode of playing the video or opening the application program corresponding to the video pre-configuration, fig. 3 is a schematic diagram of a method for determining an optional target account set according to an embodiment of the present invention, as shown in fig. 3, the flow includes the following steps:

s302, pushing a red dot 306 at a preset interactive object 304 of a preset display interface 302 for the application programs of the selected n users (corresponding to the seed account set);

s304, after the interaction operation associated with the preset interaction object is obtained in the application program, a publishing page 308 of the preset interaction object is opened, wherein the publishing page 308 is used for displaying the corresponding service or media information 310 or other application programs needing to be promoted or shared;

s306, by performing an interactive operation on the interactive object 312 on the release page, a corresponding service or media information or other application program that needs to be promoted or shared is opened or played, and when the service or media information 312 belongs to another application program, user diversion for the other application program 314 is completed, so as to bring a earliest batch of seed users (corresponding to the seed account set) to the other application program, and then, by means of the seed users' propagation and sharing in the instant messaging application, the promoted or shared service or media information is realized, so that diversion of the other application program can be effectively realized, the scale of the user or account is enlarged, and the propagation speed of the user is accelerated.

It should be noted that, in the current service scenario, the selecting n users in the user group using the instant messaging application in the target network may be implemented by a determination method including, but not limited to, the target account set, and the group of seed users that bring the earliest period to other application programs may also be implemented by a determination method including, but not limited to, the target account set, where the instant messaging application is merely an example, and may also include, but not limited to, other application programs that can promote or share corresponding service or media information or guide other application programs.

Optionally, in this embodiment, the method for determining the target account set may be applied to, but not limited to, a service scenario of the instant messaging application, and may also include, but not limited to, an application that requires an order of magnitude of expanding the seed account, such as a media information sharing application, a browser application, an education application, a medical application, a game application, and a transportation application.

Optionally, in this embodiment, the first group of characterization vectors is used to represent characteristic information of each account in the first group of seed accounts, and may be, but is not limited to, capability information or probability that each account in the first group of seed accounts activates another account, and the second group of characterization vectors is used to represent characteristic information of each account in a group of candidate accounts, and may be, but is not limited to, capability information or probability that each account in the group of candidate accounts activates another account.

The above is merely an example, and the present embodiment is not limited in any way.

Optionally, in this embodiment, the determining the distance between the group of candidate accounts and the first group of seed account numbers may include, but is not limited to, a cosine distance between the first group of token vectors and the second group of token vectors, or representing the distance between the group of candidate accounts and the first group of seed account numbers by cosine similarity.

According to the embodiment, a first group of characteristic vectors of a first group of seed accounts and a second group of characteristic vectors of a group of candidate accounts are determined according to the portrait characteristic data and the behavior characteristic data of the accounts, then the distance between the group of candidate accounts and the first group of seed accounts is determined according to the first group of characteristic vectors and the second group of characteristic vectors, then the second group of seed accounts is determined in the group of candidate accounts, a target account set is determined to comprise the first group of seed accounts and the second group of seed accounts, and the purpose of expanding the seed account set is achieved by adding the second group of seed accounts determined in the candidate set in the target account set, so that the account set with larger propagation influence is determined quickly and effectively, the technical effect of higher expansibility is achieved, and the problems that the account set with larger propagation influence is difficult to really quick, and the problems existing in the related technology are solved, The technical problem of effectively determining the account set with large propagation influence is solved.

In an alternative embodiment, obtaining a first set of token vectors for a first set of seed account numbers includes: acquiring a first group of characteristic data corresponding to a first group of seed account numbers, wherein the first group of characteristic data comprises a first group of portrait characteristic data and a first group of behavior characteristic data of the first group of seed account numbers; and inputting the first group of feature data into the target neural network model to obtain a first group of characterization vectors.

Optionally, in this embodiment, the first set of feature data may include, but is not limited to, portrait features such as gender, age, and region of the user, and behavior features such as video playing and sharing times of the user. The target neural network model is obtained after the neural network model to be trained is trained, the training samples can include, but are not limited to, pre-labeled sample accounts or unlabeled sample accounts, the pre-labeled accounts can be trained based on a supervised mode, and the unlabeled sample accounts can be trained based on an unsupervised mode.

For example, the types of the neural network model may include, but are not limited to, Independent Cascade (IC), Linear Threshold (LT), triggerring (tr), and the like, or a model established by methods such as peoplerank, and may further include, but is not limited to, DNN deep neural network models, RNN recurrent neural networks, DRN residual neural networks, and the like.

In an alternative embodiment, inputting the first set of sample feature data into the target neural network model to obtain a first set of characterization vectors includes: randomly initializing each feature data in a first group of portrait feature data and a first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors comprises a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of a first group of seed accounts, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed accounts, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed accounts, and the third class feature data is pre-configured feature data; carrying out full-connection conversion on the first class vector and the third class vector to generate a second group of vectors; respectively performing first target processing and second target processing on the second class of vectors based on the feature data corresponding to the multiple identifications, and performing full-connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifications, and the second target processing is used for summing the feature data corresponding to the multiple identifications and calculating an average value; a first set of token vectors is determined based on the second set of vectors and the third set of vectors.

Optionally, fig. 4 is a flowchart illustrating another optional method for determining a target account set according to an embodiment of the present invention, and as shown in fig. 4, the steps of the flowchart are as follows:

s402, randomly initializing each feature data in a first group of portrait feature data and a first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors comprise a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of a first group of seed accounts, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed accounts, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed accounts, and the third class feature data is pre-configured feature data;

s404, carrying out full-connection conversion on the first-class vector and the third-class vector to generate a second group of vectors;

s406, respectively performing first target processing and second target processing on the second-class vector based on the feature data corresponding to the multiple identifiers, and performing full-connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the second target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value;

s408, determining a first group of characterization vectors according to the second group of vectors and the third group of vectors.

Optionally, in this embodiment, the randomly initializing may include, but is not limited to, randomly initializing a weight coefficient corresponding to each of the feature data to generate the first group of vectors.

Optionally, in this embodiment, the first type of feature data may include, but is not limited to, id type of feature data, such as age, gender, and region of the first type, and taking age as an example, when identifying the feature data, the first type of feature data may be directly identified in a numerical manner, for example, if the age feature data corresponding to one account in the first group of seed accounts is 22, the feature data is directly identified at a position corresponding to the age feature by using the numerical value 22.

Optionally, in this embodiment, the second type of feature data may include, but is not limited to, id _ list type feature data, for example, a field preferred by the user, the number of times that the user shares a click, the number of times that the user activates other users, and the like, and taking the field preferred by the user as an example, when identifying the feature data, a plurality of values may be used to jointly identify the second type of feature data, for example, the field preferred by the user is identified as a, and the number id of a favorite person in the field a is 20, which may be represented by, but is not limited to, a _ 20.

Optionally, in this embodiment, the third type of feature data may include, but is not limited to, feature data learned through other models.

Optionally, in this embodiment, the first target process may include, but is not limited to, attention preprocessing based on an attentive mechanism, and the second target process may include, but is not limited to, summing and averaging to determine the third set of vectors.

Through the embodiment, the characteristic data used for representing the account number propagation influence can be collected in a multidimensional and more detailed manner, so that the generated vector group can represent the account number propagation influence, and the technical effects of improving the robustness and the convergence of the output result are achieved.

In an alternative embodiment, determining the first set of token vectors from the second set of vectors and the third set of vectors comprises: splicing the second group of vectors and the third group of vectors into a first target vector group; performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism; inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting the second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises the feature intersection model, and the feature intersection model is used for acquiring intersection feature information; and splicing the third group of target vectors and the first cross feature information into a first group of characterization vectors.

Optionally, in this embodiment, as shown in fig. 5, the steps of the process are as follows:

s502, splicing the second group of vectors and the third group of vectors into a first target vector group; performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism;

s504, inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron;

s506, inputting the second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises the feature intersection model, and the feature intersection model is used for obtaining intersection feature information;

and S508, splicing the third group of target vectors and the first cross feature information into a first group of characterization vectors.

Optionally, in this embodiment, the stitching may include, but is not limited to, directly or indirectly stitching the second set of vectors and the third set of vectors.

Optionally, in this embodiment, the third target process may include, but is not limited to, a self-attention based attention mechanism.

For example, the coefficient a of the attention mechanism is determined based on the first target vector group by the following formula_i：

u ＝ tanh (W₁H)

Wherein H ∈ R^n*mA matrix of n rows and m columns, n representing n kinds of the second type of feature data, m representing the dimension of each vector, and W₁∈R^k*nA matrix representing k rows and n columns is a coefficient matrix in the target neural network model, and tanh is used to represent an activation function, and may include but is not limited to

u denotes the attention cell and may be expressed as a matrix of k rows and m columns, W₂Represents a row direction of length kAnd i and j are vectors learned in the target neural network model and respectively correspond to the ith column and the jth column of the attention unit.

Optionally, in this embodiment, the multi-layer perceptron may include, but is not limited to, an MLP model, and the above-mentioned feature intersection model may include, but is not limited to, a DCN model.

Through the embodiment, the target neural network model can be effectively and iteratively optimized according to the operation, and the technical effect that the output result of the target neural network model can more effectively represent the propagation influence of the account is further achieved.

In an alternative embodiment, obtaining a second set of token vectors for a set of candidate accounts comprises: acquiring a second group of feature data corresponding to a group of candidate accounts, wherein the second group of feature data comprises a second group of portrait feature data and a second group of behavior feature data of the group of candidate accounts; and inputting the second group of characteristic data into the target neural network model to obtain a second group of characterization vectors.

Optionally, in this embodiment, the second set of feature data may include, but is not limited to, portrait features such as gender, age, and region of the user, and behavior features such as video playing and sharing times of the user. The target neural network model is obtained after the neural network model to be trained is trained, the training samples can include, but are not limited to, pre-labeled sample accounts or unlabeled sample accounts, the pre-labeled accounts can be trained based on a supervised mode, and the unlabeled sample accounts can be trained based on an unsupervised mode.

In an alternative embodiment, inputting the second set of sample feature data into the target neural network model to obtain a second set of characterization vectors, comprises: randomly initializing each feature data in a second group of portrait feature data and a second group of behavior feature data to generate a fourth group of vectors, wherein the fourth group of vectors comprises a fourth class vector, a fifth class vector and a sixth class vector, the fourth class vector is used for representing the fourth class feature data of a group of candidate accounts, the fourth class feature data is feature data represented by using one identifier, the fifth class vector is used for representing the fifth class feature data of a group of candidate accounts, the fifth class feature data is feature data represented by combining a plurality of identifiers, the sixth class vector is used for representing the sixth class feature data of a group of candidate accounts, and the sixth class feature data is pre-configured feature data; carrying out full-connection conversion on the fourth-class vector and the sixth-class vector to generate a fifth group of vectors; respectively performing fourth target processing and fifth target processing on the fifth-class vector based on the feature data corresponding to the multiple identifiers, and performing full-connection transformation to generate a sixth group of vectors, wherein the fourth target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the fifth target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; a second set of token vectors is determined based on the fifth and sixth sets of vectors.

Optionally, in this embodiment, the randomly initializing may include, but is not limited to, randomly initializing a weight coefficient corresponding to each of the feature data to generate the fourth set of vectors.

Optionally, in this embodiment, the fourth type of feature data may include, but is not limited to, id type of feature data, such as age, gender, and region of the subject, and taking age as an example, when identifying the feature data, the fourth type of feature data may be directly identified in a numerical manner, for example, if the age feature data corresponding to one account in a group of candidate accounts is 22, the feature data is directly identified at a position corresponding to the age feature by using the numerical value 22.

Optionally, in this embodiment, the fifth type of feature data may include, but is not limited to, id _ list type of feature data, for example, a preferred field of the user, the number of times that the user shares a click, the number of times that the user activates another user, and the like, and taking the preferred field of the user as an example, when identifying the feature data, the fifth type of feature data may be identified together by using a plurality of numerical values, for example, the preferred field of the user is identified as a, and the number id of a favorite person in the a field is 20, which may be represented by, but is not limited to, a _ 20.

Optionally, in this embodiment, the sixth type of feature data may include, but is not limited to, feature data learned through other models.

Optionally, in this embodiment, the fourth target processing may include, but is not limited to, attention preprocessing based on an attentive mechanism, and the fifth target processing may include, but is not limited to, summing and averaging to determine the sixth set of vectors.

In an alternative embodiment, determining the second set of token vectors from the fifth set of vectors and the sixth set of vectors comprises: splicing the fifth group of vectors and the sixth group of vectors into a fourth target vector group; performing sixth target processing on the fourth target vector group to obtain a fifth target vector group, wherein the sixth target processing is used for processing the fourth target vector group based on an attention mechanism; inputting the fifth target vector group into a preset multilayer perceptron to obtain a sixth group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting the fifth target vector group into a preset feature intersection model to obtain second intersection feature information, wherein the target neural network model comprises the feature intersection model, and the feature intersection model is used for acquiring the intersection feature information; and splicing the sixth group of target vectors and the second cross feature information into a second group of characterization vectors.

u ＝ tanh (W₁H)

u denotes the attention cell and may be expressed as a matrix of k rows and m columns, W₂And a row vector with the length of k is represented, and i and j correspond to the ith column and the jth column of the attention unit respectively, wherein the learned vector in the target neural network model is the row vector with the length of k.

In an alternative embodiment, determining the distance between the set of candidate account numbers and the first set of seed account numbers according to the first set of token vectors and the second set of token vectors comprises: and obtaining a cosine distance between each token vector in the second group of token vectors and each token vector in the first group of token vectors to obtain a group of cosine distances corresponding to each candidate account, wherein the cosine distance between a second token vector in the second group of token vectors and a first token vector in the first group of token vectors is used for representing the distance between a second candidate account and a first seed account, the second candidate account is a candidate account corresponding to the second token vector in the group of candidate accounts, and the first seed account is a seed account corresponding to the first token vector in the first group of seed accounts.

Optionally, in this embodiment, obtaining a cosine distance between each token vector in the second set of token vectors and each token vector in the first set of token vectors, and obtaining a set of cosine distances corresponding to each candidate account may include, but is not limited to, fixing the second set of token vectors and calculating a cosine distance between each token vector in the first set of token vectors and each token vector in the first set of token vectors, in other words, determining a set of cosine distances for each candidate account in a set of candidate accounts, and the cosine distances may be determined by using, but is not limited to, Knn algorithm.

In an alternative embodiment, determining the second set of seed account numbers in the set of candidate account numbers according to the distance between the set of candidate account numbers and the first set of seed account numbers includes: acquiring an average value of a group of cosine distances corresponding to each candidate account; determining candidate accounts with an average value larger than a preset threshold value in a group of candidate accounts as a second group of seed accounts; or, determining the first N candidate accounts in the group of candidate accounts sorted according to the average value as a second group of seed accounts, where N is a natural number.

Optionally, in this embodiment, the average value is used to represent a difference between the propagation influence of the candidate account and the propagation influence of the seed account, and the second group of seed accounts may be determined in different manners based on different actual requirements.

For example, the predetermined threshold or N may be determined according to a scale that the seed account set needs to be expanded, and when the scale is large, the predetermined threshold or N is configured to be larger, so as to expand the seed account set according to different actual requirements.

In an optional embodiment, after determining the set of target accounts as including the first set of seed accounts and the second set of seed accounts, the method further comprises: and sending the target media resource to the accounts in the target account set.

Optionally, the target media assets may include, but are not limited to, H5 pages/links, static/dynamic advertisements, videos, articles, public numbers, applets, etc., which are just examples, and the present embodiment does not make any specific limitation on the type of media assets specifically sent through the account.

In an optional embodiment, before the obtaining the first set of characterization vectors for the first set of seed account numbers and the second set of characterization vectors for the set of candidate account numbers, the method further includes: obtaining first data associated with a set of nodes in a target network, wherein the first data is used for indicating the frequency and the path of activating other nodes in the set of nodes by each node in the set of nodes; and generating a target set based on the capability of each node in the first data for activating the other nodes, wherein the target set is a set formed by nodes corresponding to the first group of seed accounts in a target network, and the capability of each node in the first data for activating the other nodes is determined based on the frequency and the path.

Optionally, in this embodiment, the target network may include, but is not limited to, a network formed by clients where the seed account is located, the group of nodes may include, but is not limited to, multiple accounts of the target network or clients corresponding to the multiple accounts, and the first data may include, but is not limited to, data capable of determining a frequency and a path of activating other nodes in the group of nodes by each node in the group of nodes, for example, multiple sharing click data of the nodes, and the capability information of activating other nodes by each node is obtained by obtaining the sharing click frequency.

Through this embodiment, can effectually enlarge the account number quantity in the first set of seed account number, and then, reach the in-process of propagating through above-mentioned first set of seed account number, the transmission efficiency is higher, the faster technological effect of propagation speed.

In an optional embodiment, generating the target set based on the capability of each node in the first data to activate the other nodes includes:

merging the n first sets into a second set;

Optionally, in this embodiment, fig. 6 is a schematic diagram of another optional method for determining a target account set according to an embodiment of the present invention, and as shown in fig. 6, the process includes the following steps:

s1, as shown in fig. 6, obtaining multi-degree sharing click data of a user, where the multi-degree sharing click data may count multi-degree sharing click data within a preset time period from an original user sharing click, and obtaining a frequency and a path of the sharing click according to the multi-degree sharing click data to generate a first directed graph 602, where the first directed graph includes a node 604 and an activation probability 606;

s2, sampling the first directed graph, selecting whether to reserve each edge according to a certain random distribution, and finally obtaining n second directed graphs 608, where the sampling may include, but is not limited to, sampling according to an activation probability 606, generating a simulated random number (corresponding to a simulated random activation probability) for each directed edge at a time, comparing the simulated random number with the activation probability, reserving the path if the simulated random number is greater than the simulated random probability, and deleting the path and the activation probability from the first directed graph 602 if the simulated random number is not greater than the simulated random probability;

s3, randomly selecting a node in each second directed graph, and calculating an RR set (corresponding to the first set) corresponding to the node, where the RR set is a set including A, B, C, D, E nodes, taking the second directed graph 610 as an example, and taking the node D as an example.

S4, repeating the steps S2 and S3 θ times, wherein the θ times can be estimated by a preset algorithm, and can be estimated according to the number of nodes in the network and the number of the first set to obtain an approximately optimal solution, and then all RR sets are combined into a second set R;

s5, counting the node v0 with the highest frequency of occurrence for the second set R, taking out all RR sets containing v0, taking out the nodes (corresponding to the third set) with the frequency of occurrence topK in the RR sets (corresponding to the second set of nodes, K can be configured in advance according to the service condition), placing the nodes in a seed set S (corresponding to the target set), then deleting the RR sets from the set R, continuing to find the node with the highest frequency of occurrence in the rest RR sets according to the same method, repeating the process until the set R is empty or reaching the stop condition, and obtaining the seed set S.

The invention will be further illustrated with reference to specific examples:

fig. 7 is a schematic diagram of an alternative method for determining a target account set according to an embodiment of the present invention, in which a DNN model shown in fig. 7 is used to obtain a characterization vector of each user (corresponding to the aforementioned account) with a propagation influence size as a learning target (which may include, but is not limited to, pre-defining high and low influences in advance, and dividing positive and negative samples).

The process comprises the following steps:

s702, inputting original portrait characteristics such as gender, age and region and behavior characteristics such as video playing and sharing times of a user, and randomly initializing each characteristic into a 128-dimensional vector;

s704, for id class features (corresponding to the second class of feature data), directly converting full connection into a new vector (corresponding to the first class of vector), for id _ list class features, a part of the id _ list class features is internally subjected to an attention to obtain a new vector (corresponding to the second class of vector), and another part of the id _ list class features is subjected to averaging to obtain a new vector, and then converted into a new vector through a full connection layer, and for vector class features, directly converting full connection into a new vector (corresponding to the third class of vector);

s706, all 128-dimensional vectors obtained by full connection of the previous layer are spliced into a vector group, as shown in a matrix part 702 in FIG. 7, and a 128-dimensional vector 704 is obtained after one layer of self-attention;

s708, aiming at the 128-dimensional vectors, two operations are respectively executed, wherein the first operation is to obtain a 128-dimensional vector directly through a layer of MLP (corresponding to the multilayer perceptron), the second operation is to obtain cross characteristic information through a DCN module (corresponding to the characteristic cross module), then the results of the two parts are spliced, the final output is obtained through the final output result of the two layers of MLPs, then the final output is obtained through cross entropy calculation with a target node, the integral model architecture can be subjected to iterative optimization, and the 128-dimensional vector 706 of the result is output by using the second last MLP as the representation vector of each user.

And S710, calculating 3000 users with the smallest cos distance (corresponding to the first group of seed accounts) in the same seed set for the users (corresponding to the group of candidate accounts) in the non-seed set by utilizing knn, taking the average distance to represent the distance between the user and the seed set, and expanding the distance to the order of magnitude of the seed accounts according to the requirement from small to large.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiments of the present invention, there is also provided a target account set determination apparatus for implementing the above target account set determination method. As shown in fig. 8, the apparatus includes:

an obtaining module 802, configured to obtain a first group of token vectors of a first group of seed accounts and a second group of token vectors of a group of candidate accounts, where the token vectors in the first group of token vectors correspond to the seed accounts in the first group of seed accounts one to one, the token vectors in the second group of token vectors correspond to the candidate accounts in the group of candidate accounts one to one, the token vectors in the first group of token vectors are determined according to a first group of portrait feature data and a first group of behavior feature data of the first group of seed accounts, and the token vectors in the second group of token vectors are determined according to a second group of portrait feature data and a second group of behavior feature data of the group of candidate accounts;

a first determining module 804, configured to determine a distance between a group of candidate accounts and a first group of seed accounts according to the first group of token vectors and the second group of token vectors;

a second determining module 806, configured to determine a second group of seed account numbers in the group of candidate account numbers according to a distance between the group of candidate account numbers and the first group of seed account numbers, and determine the target account number set as including the first group of seed account numbers and the second group of seed account numbers.

In an alternative embodiment, the obtaining module 802, as shown in fig. 9, includes: a first obtaining unit 902, configured to obtain a first set of feature data corresponding to a first set of seed account numbers, where the first set of feature data includes a first set of portrait feature data and a first set of behavior feature data of the first set of seed account numbers;

the first processing unit 904 is configured to input the first set of feature data into the target neural network model to obtain a first set of characterization vectors.

In an alternative embodiment, the first processing unit 904 is configured to input the first set of sample feature data into the target neural network model to obtain a first set of characterization vectors by: randomly initializing each feature data in a first group of portrait feature data and a first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors comprises a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of a first group of seed accounts, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed accounts, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed accounts, and the third class feature data is pre-configured feature data; carrying out full-connection conversion on the first class vector and the third class vector to generate a second group of vectors; respectively performing first target processing and second target processing on the second class of vectors based on the feature data corresponding to the multiple identifications, and performing full-connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifications, and the second target processing is used for summing the feature data corresponding to the multiple identifications and calculating an average value; a first set of token vectors is determined based on the second set of vectors and the third set of vectors.

In an alternative embodiment, the first processing unit 904 is configured to determine the first set of token vectors from the second set of vectors and the third set of vectors by: splicing the second group of vectors and the third group of vectors into a first target vector group; performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism; inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting the second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises the feature intersection model, and the feature intersection model is used for acquiring intersection feature information; and splicing the third group of target vectors and the first cross feature information into a first group of characterization vectors.

In an alternative embodiment, the obtaining module 802, as shown in fig. 10, includes: a second obtaining unit 1002, configured to obtain a second set of feature data corresponding to a set of candidate accounts, where the second set of feature data includes a second set of portrait feature data and a second set of behavior feature data of the set of candidate accounts; and the second processing unit 1004 is configured to input the second set of feature data into the target neural network model to obtain a second set of characterization vectors.

In an alternative embodiment, the second processing unit 1004 is configured to input the second set of sample feature data into the target neural network model to obtain a second set of characterization vectors by: randomly initializing each feature data in a second group of portrait feature data and a second group of behavior feature data to generate a fourth group of vectors, wherein the fourth group of vectors comprises a fourth class vector, a fifth class vector and a sixth class vector, the fourth class vector is used for representing the fourth class feature data of a group of candidate accounts, the fourth class feature data is feature data represented by using one identifier, the fifth class vector is used for representing the fifth class feature data of a group of candidate accounts, the fifth class feature data is feature data represented by combining a plurality of identifiers, the sixth class vector is used for representing the sixth class feature data of a group of candidate accounts, and the sixth class feature data is pre-configured feature data; carrying out full-connection conversion on the fourth-class vector and the sixth-class vector to generate a fifth group of vectors; respectively performing fourth target processing and fifth target processing on the fifth-class vector based on the feature data corresponding to the multiple identifiers, and performing full-connection transformation to generate a sixth group of vectors, wherein the fourth target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the fifth target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value; a second set of token vectors is determined based on the fifth and sixth sets of vectors.

In an alternative embodiment, the second processing unit 1004 is configured to determine the second set of characterization vectors from the fifth set of vectors and the sixth set of vectors by: splicing the fifth group of vectors and the sixth group of vectors into a fourth target vector group; performing sixth target processing on the fourth target vector group to obtain a fifth target vector group, wherein the sixth target processing is used for processing the fourth target vector group based on an attention mechanism; inputting the fifth target vector group into a preset multilayer perceptron to obtain a sixth group of target vectors, wherein the target neural network model comprises the multilayer perceptron; inputting the fifth target vector group into a preset feature intersection model to obtain second intersection feature information, wherein the target neural network model comprises the feature intersection model, and the feature intersection model is used for acquiring the intersection feature information; and splicing the sixth group of target vectors and the second cross feature information into a second group of characterization vectors.

In an alternative embodiment, the first determining module 804 includes: the third processing unit is configured to obtain a cosine distance between each token vector in the second group of token vectors and each token vector in the first group of token vectors, and obtain a group of cosine distances corresponding to each candidate account, where the cosine distance between a second token vector in the second group of token vectors and a first token vector in the first group of token vectors is used to represent a distance between a second candidate account and a first seed account, the second candidate account is a candidate account in the group of candidate accounts corresponding to the second token vector, and the first seed account is a seed account in the first group of seed accounts corresponding to the first token vector.

In an alternative embodiment, the second determining module 808 includes: the third acquisition unit is used for acquiring the average value of a group of cosine distances corresponding to each candidate account; the first determining unit is used for determining the candidate accounts with the average value larger than a preset threshold value in a group of candidate accounts as a second group of seed accounts; or, the second determining unit is configured to determine, as the second group of seed accounts, the first N candidate accounts in the group of candidate accounts that are sorted according to the average value, where N is a natural number.

In an optional embodiment, after determining the set of target accounts as including the first set of seed accounts and the second set of seed accounts, the apparatus is further configured to: and sending the target media resource to the accounts in the target account set.

In an optional embodiment, the apparatus is further configured to:

In an optional embodiment, the apparatus is further configured to generate a target set based on the ability of each node in the first data to activate the other nodes by:

merging the n first sets into a second set;

According to another aspect of the embodiment of the present invention, there is further provided an electronic device for implementing the method for determining a target account set, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 11, the electronic device comprises a memory 1102 and a processor 1104, wherein the memory 1102 stores a computer program and the processor 1104 is arranged to execute the steps of any of the above method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a first group of characterization vectors of a first group of seed accounts and a second group of characterization vectors of a group of candidate accounts, wherein the characterization vectors in the first group of characterization vectors correspond to the seed accounts in the first group of seed accounts one by one, the characterization vectors in the second group of characterization vectors correspond to the candidate accounts in the group of candidate accounts one by one, the characterization vectors in the first group of characterization vectors are determined vectors according to first group of portrait feature data and first group of behavior feature data of the first group of seed accounts, and the characterization vectors in the second group of characterization vectors are determined vectors according to second group of portrait feature data and second group of behavior feature data of the group of candidate accounts;

s2, determining the distance between a group of candidate accounts and the first group of seed accounts according to the first group of characterization vectors and the second group of characterization vectors;

s3, determining a second group of seed accounts among the set of candidate accounts according to the distance between the set of candidate accounts and the first group of seed accounts, and determining the target account set as including the first group of seed accounts and the second group of seed accounts.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

The memory 1102 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for determining a target account set in the embodiment of the present invention, and the processor 1104 executes various functional applications and data processing by running the software programs and modules stored in the memory 1102, that is, the method for determining a target account set is implemented. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be used for storing information such as the first set of seed accounts and the set of candidate accounts. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, the obtaining module 602, the first determining module 604, and the second determining module 606 in the determining device of the target account number set. In addition, other module units in the determination apparatus for the target account set may also be included, but are not limited to this, and are not described in detail in this example.

Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to a further aspect of an embodiment of the invention, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations of the process of determining the target account set.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for determining a target account set includes:

acquiring a first group of characterization vectors of a first group of seed accounts and a second group of characterization vectors of a group of candidate accounts, wherein the characterization vectors in the first group of characterization vectors correspond to the seed accounts in the first group of seed accounts one by one, the characterization vectors in the second group of characterization vectors correspond to the candidate accounts in the group of candidate accounts one by one, the characterization vectors in the first group of characterization vectors are determined according to a first group of portrait feature data and a first group of behavior feature data of the first group of seed accounts, and the characterization vectors in the second group of characterization vectors are determined according to a second group of portrait feature data and a second group of behavior feature data of the group of candidate accounts;

determining a distance between the set of candidate account numbers and the first set of seed account numbers according to the first set of characterization vectors and the second set of characterization vectors;

determining a second group of seed account numbers in the group of candidate account numbers according to the distance between the group of candidate account numbers and the first group of seed account numbers, and determining a target account number set as comprising the first group of seed account numbers and the second group of seed account numbers;

the obtaining a first set of characterization vectors for a first set of seed account numbers includes:

acquiring a first group of characteristic data corresponding to the first group of seed account numbers, wherein the first group of characteristic data comprises a first group of portrait characteristic data and a first group of behavior characteristic data of the first group of seed account numbers;

inputting the first group of feature data into a target neural network model to obtain a first group of characterization vectors;

wherein the inputting the first set of sample feature data into a target neural network model to obtain the first set of characterization vectors comprises: randomly initializing each feature data in the first group of portrait feature data and the first group of behavior feature data to generate a first group of vectors, wherein the first group of vectors includes a first class vector, a second class vector and a third class vector, the first class vector is used for representing the first class feature data of the first group of seed account numbers, the first class feature data is feature data represented by using one identifier, the second class vector is used for representing the second class feature data of the first group of seed account numbers, the second class feature data is feature data represented by combining a plurality of identifiers, the third class vector is used for representing the third class feature data of the first group of seed account numbers, and the third class feature data is preconfigured feature data;

carrying out full-connection conversion on the first class vector and the third class vector to generate a second group of vectors;

respectively performing first target processing and second target processing on the second-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a third group of vectors, wherein the first target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the second target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value;

determining the first set of characterization vectors from the second and third sets of vectors.

2. The method of claim 1, wherein determining the first set of characterization vectors from the second set of vectors and the third set of vectors comprises:

stitching the second set of vectors and the third set of vectors into a first target set of vectors;

performing third target processing on the first target vector group to obtain a second target vector group, wherein the third target processing is used for processing the first target vector group based on an attention mechanism;

inputting the second target vector group into a preset multilayer perceptron to obtain a third group of target vectors, wherein the target neural network model comprises the multilayer perceptron;

inputting a second target vector group into a preset feature intersection model to obtain first intersection feature information, wherein the target neural network model comprises a feature intersection model which is used for obtaining the intersection feature information;

and splicing the third group of target vectors and the first cross feature information into the first group of characterization vectors.

3. The method of claim 1, wherein obtaining a second set of token vectors for a set of candidate accounts comprises:

acquiring a second set of feature data corresponding to the set of candidate accounts, wherein the second set of feature data comprises a second set of portrait feature data and a second set of behavior feature data of the set of candidate accounts;

and inputting the second group of feature data into a target neural network model to obtain a second group of characterization vectors.

4. The method of claim 3, wherein said inputting the second set of sample feature data into a target neural network model to obtain the second set of characterization vectors comprises:

randomly initializing each feature data in the second group of portrait feature data and the second group of behavior feature data to generate a fourth group of vectors, where the fourth group of vectors includes a fourth class vector, a fifth class vector and a sixth class vector, the fourth class vector is used to represent the fourth class feature data of the group of candidate accounts, the fourth class feature data is feature data represented by using one identifier, the fifth class vector is used to represent the fifth class feature data of the group of candidate accounts, the fifth class feature data is feature data represented by combining a plurality of identifiers, the sixth class vector is used to represent the sixth class feature data of the group of candidate accounts, and the sixth class feature data is preconfigured feature data;

carrying out full-connection transformation on the fourth-class vector and the sixth-class vector to generate a fifth group of vectors;

respectively performing fourth target processing and fifth target processing on the fifth-class vector based on the feature data corresponding to the multiple identifiers, and performing full connection transformation to generate a sixth group of vectors, wherein the fourth target processing is used for adding weight coefficients to the feature data corresponding to the multiple identifiers, and the fifth target processing is used for summing the feature data corresponding to the multiple identifiers and calculating an average value;

determining the second set of characterization vectors from the fifth and sixth sets of vectors.

5. The method of claim 4, wherein said determining the second set of characterization vectors from the fifth set of vectors and the sixth set of vectors comprises:

stitching the fifth set of vectors and the sixth set of vectors into a fourth set of target vectors;

performing sixth target processing on the fourth target vector group to obtain a fifth target vector group, wherein the sixth target processing is used for processing the fourth target vector group based on an attention mechanism;

inputting the fifth target vector group into a preset multilayer perceptron to obtain a sixth group of target vectors, wherein the target neural network model comprises the multilayer perceptron;

inputting a fifth target vector group into a preset feature intersection model to obtain second intersection feature information, wherein the target neural network model comprises a feature intersection model, and the feature intersection model is used for acquiring the intersection feature information;

and splicing the sixth group of target vectors and the second cross feature information into the second group of characterization vectors.

6. The method of any one of claims 1 to 5, wherein determining the distance between the set of candidate accounts and the first set of seed accounts based on the first set of token vectors and the second set of token vectors comprises:

and obtaining a cosine distance between each token vector in the second group of token vectors and each token vector in the first group of token vectors to obtain a group of cosine distances corresponding to each candidate account, wherein the cosine distance between the second token vector in the second group of token vectors and the first token vector in the first group of token vectors is used for representing the distance between a second candidate account and a first seed account, the second candidate account is a candidate account corresponding to the second token vector in the group of candidate accounts, and the first seed account is a seed account corresponding to the first token vector in the first group of seed accounts.

7. The method of claim 6, wherein determining a second set of seed account numbers in the set of candidate account numbers based on the distance between the set of candidate account numbers and the first set of seed account numbers comprises:

acquiring an average value of a group of cosine distances corresponding to each candidate account;

determining candidate accounts in the set of candidate accounts for which the average value is greater than a predetermined threshold as the second set of seed accounts; or determining the first N candidate accounts in the group of candidate accounts after being sorted according to the average value as the second group of seed accounts, wherein N is a natural number.

8. The method of any one of claims 1 to 4, wherein after determining the set of target accounts as including the first set of seed accounts and the second set of seed accounts, the method further comprises:

and sending the target media resources to the accounts in the target account set.

9. The method of any one of claims 1 to 4, wherein prior to said obtaining a first set of characterization vectors for a first set of seed account numbers and a second set of characterization vectors for a set of candidate account numbers, the method further comprises:

10. The method of claim 9, wherein generating a target set based on the ability of each node in the first data to activate the other nodes comprises:

sampling the first directed graph n times to generate n second directed graphs, wherein the second directed graphs record a first group of nodes in the group of nodes and a group of activation paths of each node in the first group of nodes when other nodes in the first group of nodes are activated, and the first group of nodes are obtained by sampling the first directed graph according to activation probability;

calculating n first sets corresponding to first nodes based on the n second directed graphs, wherein the first sets comprise all nodes which can reach the first nodes through the activation paths in the first group of nodes and the first nodes, and the first group of nodes comprises the randomly selected first nodes;

merging the n first sets into a second set;

11. An apparatus for determining a target account set, comprising:

a second determining module, configured to determine a second set of seed account numbers in the set of candidate account numbers according to a distance between the set of candidate account numbers and the first set of seed account numbers, and determine a target account number set as including the first set of seed account numbers and the second set of seed account numbers;

12. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 10.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 10 by means of the computer program.