CN114385903B

CN114385903B - Application account identification method and device, electronic equipment and readable storage medium

Info

Publication number: CN114385903B
Application number: CN202011139997.0A
Authority: CN
Inventors: 康战辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2024-02-06
Anticipated expiration: 2040-10-22
Also published as: CN114385903A

Abstract

The application relates to the technical field of Internet, and discloses an application account identification method, an application account identification device, electronic equipment and a readable storage medium, wherein the application account identification method comprises the following steps: acquiring at least one target hot search word of an application program, and acquiring a title of at least one first content information published by at least one application account through the application program; determining a first probability that the application account belongs to a specific category based on the target hot search word and the title of the first content information; acquiring at least one piece of second content information issued by the application account through the application program, and determining a second probability that the application account belongs to a specific category based on the second content information; the category of the application account is determined based on the first probability and the second probability. The identification method of the application account can effectively improve the accuracy rate of identification of the application account category through an artificial intelligence technology.

Description

Application account identification method and device, electronic equipment and readable storage medium

Technical Field

The application relates to the technical field of internet, in particular to an identification method and device of an application account, electronic equipment and a readable storage medium.

Background

As mobile internet content moves from single graphics context to graphics context combined with short video, there are more and more application accounts on various applications that release a large amount of short video content, for example, there are more and more public numbers that release videos on a micro-message platform.

The hot search word appears in various application programs, and more application account numbers issue information related to the hot search word aiming at the hot search word, such as issuing video or graphic information for drainage, namely SEO (Search Engine Optimization ), so that the type of the application account numbers are necessary to be identified.

At present, keywords are usually carried out on information published by an application account to identify the type of the application account, and the accuracy of identification in the mode is not high enough.

Disclosure of Invention

The present application aims to solve at least one of the above technical drawbacks, and specifically proposes the following technical solutions:

in a first aspect, an identification method of an application account is provided, including:

acquiring at least one target hot search word of an application program, and acquiring a title of at least one first content information issued by at least one application account through the application program;

determining a first probability that the application account belongs to a specific category based on the target hot search word and the title of the first content information;

Acquiring at least one piece of second content information issued by the application account through the application program, and determining a second probability that the application account belongs to the specific category based on the second content information;

and determining the category of the application account based on the first probability and the second probability.

In an alternative embodiment of the first aspect, the first content information comprises at least one of first teletext information and video; the second content information includes second teletext information.

In an optional embodiment of the first aspect, determining, based on the target hot-search term and the title of the first content information, a first probability that the application account belongs to a specific category includes:

determining semantic similarity between the title of the first content information and the target hot search word;

the first probability is determined based on the semantic similarity.

In an optional embodiment of the first aspect, determining a semantic similarity between a title of the first content information and the target hotsearch term comprises:

converting the title of the first content information into a title vector, and converting the target hot search word into a corresponding hot search word vector;

and determining semantic similarity between the heading vector and the hot search word vector.

In an optional embodiment of the first aspect, converting the title of the first content information into a title vector comprises:

splitting the title into at least one word;

if the number of the words obtained through splitting is larger than or equal to the preset number, converting the words with the preset number in front of the sequence in the title into the title vector;

if the number of the words obtained by splitting is smaller than the preset number, repeating the sequence in the final word in the title until the number of the words is equal to the preset number, and converting the title after the words are repeated into the title vector.

In an optional embodiment of the first aspect, determining the first probability based on the semantic similarity comprises:

determining a first number of the at least one first content information, and determining a second number of the at least one target hot search word;

and normalizing the determined semantic similarity based on the maximum value in the first quantity and the second quantity to obtain the first probability.

In an optional embodiment of the first aspect, determining a second probability that the application account belongs to the specific category based on the second content information comprises:

Converting the second content information into text information in a preset format;

word segmentation is carried out on the text information to obtain at least one word, and a word vector corresponding to the at least one word is obtained;

converting the word vector into a vector to be classified, classifying the vector to be classified, and determining the type of the second content information;

and determining the second probability based on the type respectively corresponding to at least one piece of second content information issued by the application account.

In an alternative embodiment of the first aspect, converting the term vector into a vector to be classified includes:

acquiring an average value of numerical values of each adjacent preset dimension in the word vector;

and constructing the vector to be classified based on the obtained average value.

In an optional embodiment of the first aspect, determining the category of the application account based on the first probability and the second probability identification comprises:

fusing the first probability and the second probability based on a preset weight to obtain a fused value;

and determining the category of the application account corresponding to the fusion value.

In an optional embodiment of the first aspect, fusing the first probability and the second probability based on a preset weight to obtain a fused value includes:

Acquiring the registration time of the application account, and determining a third probability corresponding to the registration time;

and fusing the first probability, the second probability and the third probability based on a preset weight to obtain a fused value.

In an optional embodiment of the first aspect, the target hot-search word is a hot-search word of the application program within a preset time period, and the first content information is issued by the application account through the application program within the preset time period.

In a second aspect, an identification device for an application account is provided, including:

the system comprises an acquisition module, a search module and a search module, wherein the acquisition module is used for acquiring at least one target hot search word of an application program and acquiring a title of at least one first content information issued by at least one application account through the application program;

the first determining module is used for determining a first probability that the application account belongs to a specific category based on the target hot search word and the title of the first content information;

the second determining module is used for acquiring at least one piece of second content information issued by the application account through the application program and determining a second probability that the application account belongs to the specific category based on the second content information;

And the identification module is used for determining the category of the application account based on the first probability and the second probability.

In an alternative embodiment of the second aspect, the first content information comprises at least one of first teletext information and video; the second content information includes second teletext information.

In an optional embodiment of the second aspect, the first determining module is specifically configured to, when determining, based on the target hot search word and the title of the first content information, a first probability that the application account belongs to a specific category:

the first probability is determined based on the semantic similarity.

In an optional embodiment of the second aspect, the first determining module is specifically configured to, when determining a semantic similarity between the title of the first content information and the target hot-search term:

In an alternative embodiment of the second aspect, the first determining module is specifically configured to, when converting the title of the first content information into a title vector:

Splitting the title into at least one word;

In an optional embodiment of the second aspect, the first determining module is specifically configured to, when determining the first probability based on the semantic similarity:

In an optional embodiment of the second aspect, the second determining module is specifically configured to, when determining, based on the second content information, a second probability that the application account belongs to the specific category:

In an alternative embodiment of the second aspect, the second determining module is specifically configured to, when converting the term vector into the vector to be classified:

and constructing a vector to be classified based on the obtained average value.

In an alternative embodiment of the second aspect, the identification module is specifically configured to, when determining the category of the application account based on the first probability and the second probability identification:

fusing the first probability and the second probability based on preset weights to obtain a fused value;

In an optional embodiment of the second aspect, the identification module is specifically configured to, when fusing the first probability and the second probability based on a preset weight to obtain a fused value:

and fusing the first probability, the second probability and the third probability based on preset weights to obtain a fused value.

In an optional embodiment of the second aspect, the target hot-search word is a hot-search word of the application program within a preset time period, and the first content information is issued by the application account through the application program within the preset time period.

In a third aspect, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for identifying an application account shown in the first aspect of the present application when executing the program.

In a fourth aspect, a computer readable storage medium is provided, where a computer program is stored on the computer readable storage medium, and the program is executed by a processor to implement the method for identifying an application account shown in the first aspect of the present application.

The beneficial effects that this application provided technical scheme brought are:

according to the title of the first content information published by the target hot search word and the application account of the application program, the first probability that the application account belongs to a specific category is determined, the second probability that the application account belongs to the specific category is determined according to the second content information published by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the relation between the title published by the application account and the target hot search word can be considered, the category corresponding to the second content information published by the application account can be considered, and the accuracy of the category identification of the application account can be improved.

Furthermore, by converting the word vectors into the vectors to be classified, the semantics of at least two adjacent words can be combined, the combined at least two words possibly have relevance, the obtained semantics are more perfect, the accuracy of the classification result can be improved, and the calculated amount in the classification process is reduced.

Further, the third probability that the application account belongs to the specific category is determined through the registration time of the application account, and the category of the application account is identified according to the first probability, the second probability and the third probability, so that the accuracy of identification of the category of the application account can be further improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is an application environment diagram of an identification method of an application account provided in an embodiment of the present application;

fig. 2 is a flow chart of an identification method of an application account according to an embodiment of the present application;

fig. 3 is a flowchart of an identification method of an application account according to an embodiment of the present application;

Fig. 4 is a schematic diagram of a scheme for setting the number of words in a title of a video to a preset number in an example provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a scheme for classifying words in examples provided by embodiments of the present application;

fig. 6 is a schematic diagram of a scheme for classifying vectors to be classified in examples provided in the embodiments of the present application;

fig. 7 is a schematic diagram of a scheme for identifying a category of an application account according to an embodiment of the present application;

fig. 8 is a schematic diagram of a scheme for identifying a category of an application account according to an embodiment of the present application;

fig. 9 is a flowchart of an identification method of an application account in an example provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an identification device for an application account according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device for identifying an application account according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

The category of the application account can be determined based on the target hot search word of the application program, the title of the first content information published by the application account and the second content information published by the application account through a natural language processing technology.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The application account identification method, device, electronic equipment and computer readable storage medium aim to solve the technical problems in the prior art.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The SEO is to use the rules of the search engine to improve the natural ranking of websites in the relevant search engine, so as to lead the websites in the industry and obtain brand benefits. To a large extent, it is a business of the website operator that advances the ranking of itself or its company.

The identification method of the application account number can be applied to an application environment shown in fig. 1. Specifically, the application program generates a target hot search word according to search words of a plurality of users, acquires a title of first content information published by an application account, acquires second content information published by the application account, and identifies the type of the application account according to the second content information and the title of the first content information, for example, judges whether the application account is an SEO application account.

The identification method of the application account can be performed in the terminal or applied to the server.

As will be appreciated by those skilled in the art, a "terminal" as used herein may be a cell phone, tablet computer, PDA (Personal Digital Assistant ), MID (Mobile Internet Device, mobile internet device), etc.; the "server" may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In an embodiment of the present application, as shown in fig. 2, a method for identifying an application account is provided, where the method may be applied to a terminal or a server, and may include the following steps:

step S201, acquiring at least one target hot search word of an application program, and acquiring a title of at least one first content information published by at least one application account through the application program.

The hot search word is a word of a hot search list of the application program, and can be determined according to search words or search sentences input by a plurality of users when searching through the application program, and the hot search word is not necessarily in a word form and can be in a phrase or sentence form.

Specifically, the target hot search word may be a hot search word of the application program in a preset time period, and the first content information may be published by the application account through the application program in the preset time period.

The first content information may include at least one of first image-text information and video, which may be video published by the application account in a preset time period, or first image-text information published by the application account in a preset time period, where the first image-text information may include an article published by the application account, and the article includes text information, or may include an image or video. Specifically, the application account may be an account registered by a developer, a user or a merchant on a platform of the application program, for example, the application program is WeChat, and the application account may be a public number.

The preset time period may be a time period between when the current hot-search word is updated and when the next hot-search word is updated, for example, when the hot-search word is updated once a day, the preset time period is a time period of the day corresponding to the current hot-search word.

Step S202, determining a first probability that an application account belongs to a specific category based on the target hot search word and the title of the first content information.

The specific category may be a destination category for representing the video published by the application account, and may include, for example, an SEO category and a non-SEO category.

Specifically, the first probability belonging to a specific category may be determined by determining the degree of semantic similarity between the target hot search word and the title of the first content information, and a detailed process of determining the first probability will be described below.

Step S203, at least one piece of second content information issued by the application account through the application program is obtained, and a second probability that the application account belongs to a specific category is determined based on the second content information.

The second content information may be second graphic information issued by the application account, where the second graphic information may also include an article issued by the application account, and the article includes text information, and may also include an image or video.

Specifically, the first graphic information and the second graphic information may be the same or different.

Specifically, text information in the second content information may be acquired, and the text information may be classified to obtain a second probability, and a process for specifically determining the second probability will be described in detail below.

Step S204, determining the category of the application account based on the first probability and the second probability.

Specifically, the first probability and the second probability can be fused to obtain a fusion result, and the category of the application account is identified according to the fusion result; the category of the application account can be identified by combining the registration time and the fusion result of the application account, and the specific process of identifying the category of the application account will be described in detail below.

According to the identification method of the application account, according to the target hot search word of the application program and the title of the first content information issued by the application account, the first probability that the application account belongs to a specific category is determined, according to the second content information issued by the application account, the second probability that the application account belongs to the specific category is determined, and the category of the application account is identified by combining the first probability and the second probability, so that the relation between the title of the first content information issued by the application account and the hot search word can be considered, the category corresponding to the second content information issued by the application account can be considered, and the accuracy of the category identification of the application account can be improved.

A specific procedure for determining the first probability will be described below in connection with specific embodiments.

In an embodiment of the present application, as shown in fig. 3, the determining, based on the target hot search word and the title of the first content information, the first probability that the application account belongs to the specific category in step S202 may include:

step S210, determining semantic similarity between the title of the first content information and the target hot search word.

Specifically, the title of the first content information and the target hot search word can be respectively converted into corresponding vectors, and the similarity between the vectors is calculated to obtain the semantic similarity between the title and the target hot search word.

In a specific implementation process, determining the semantic similarity between the title of the first content information and the target hot search word in step S210 may include:

(1) And converting the title of the first content information into a title vector, and converting the target hot search word into a corresponding hot search word vector.

The process of converting a title into a title vector will be described in detail below.

Specifically, converting the title of the first content information into a title vector may include:

a. the title is split into at least one word.

Specifically, a word segmentation tool, such as open source jieba word segmentation (a word segmentation tool), may be used to segment a title into multiple words, i.e., word sequences.

b. If the number of the words obtained through splitting is larger than or equal to the preset number, converting the words with the preset number in front in sequence in the title into title vectors.

Specifically, the lengths of the title and the target hot search word are inconsistent, namely the obtained vector dimensions are inconsistent, so that the title can be fixedly converted into a vector with preset dimensions.

c. If the number of the words obtained by splitting is smaller than the preset number, repeating the sequence in the title in the last word until the number of the words is equal to the preset number, and converting the title after the words are repeated into a title vector.

For example, mapping a single term to a 100-dimensional vector may fix the heading as 20 terms in a "multi-cut, multi-complement" manner: the first 20 words of the more than 20 words are selected, and the last word is repeatedly supplemented by less than 20 words.

Taking a preset number of 5 as an example, if there are only three words, repeating the word 3 until 5 words are obtained as shown in fig. 4; if there are 6 words, the first 5 words are taken.

Specifically, taking a 100-dimensional vector, 20 words as an example, the title may be mapped to a 100×20 two-dimensional vector, and the formalized representation may be as follows:

v _text ＝f _text (x _text )∈R ^100×20 (1)

wherein v is _text Is the transformed header vector;

The above description is directed to the process of converting the title into the vector, and for the target hot search word, the process of converting the target hot search word into the vector of the hot search word is the same, and the target hot search word is split and converted into the vector of the preset dimension, which can also follow the above-mentioned 'multi-segmentation and less-complement' mode.

Specifically, through the multi-section and multi-supplement mode, when the number of the first content information is more and the header is longer, the calculated amount can be effectively reduced, and the dimension of the header vector and the dimension of the hot search word vector can be the same, namely the number of contained elements are the same, so that the accuracy of similarity calculation is improved.

(2) Semantic similarity between the heading vector and the hotsearch term vector is determined.

Specifically, a cosine similarity algorithm may be used to calculate the semantic similarity between each heading vector and each hot search term vector, where the calculation formula is as follows:

wherein V is _title(x) Representing a heading vector; v (V) _query(y) Representing a hot search word vector; v (V) _{title(x)_i} Representing the ith element in the header vector; v (V) _{query(y)_i} Representing an ith element in the hot search word vector; i is a natural number.

Step S220, determining a first probability based on the semantic similarity.

In a specific implementation process, the first probability may be determined according to the first number of the first content information, the second number of the target hot search words, and the determined semantic similarity.

Specifically, determining the first probability in step S220 based on the determined semantic similarity may include:

(1) Determining a first number of at least one first content information, and determining a second number of at least one target hot search word;

(2) And normalizing the determined semantic similarity based on the maximum value in the first quantity and the second quantity to obtain a first probability.

Specifically, the following formula may be adopted:

wherein V is _title(x) Representing a heading vector; v (V) _query(y) Representing a hot search word vector; m represents the number of hot search words; k represents the number of videos issued by the application account; m and k are natural numbers.

The molecular meaning of the formula is that if the number of target hot search words in a hot word list in the current period and all titles of first content information of an application account is larger, the accuracy of judging the correlation between the two is higher, and the denominator is normalized by using the maximum value of the number of new videos of the hot word list and the current period application account, so that the integral value of the maximum value is between 0 and 1.

The above embodiments illustrate the process of determining the first probability, and a specific process of determining the second probability will be described below in connection with specific embodiments.

In an embodiment of the present application, a possible implementation manner is provided, and determining, based on the second content information, the second probability that the application account belongs to the specific category in step S203 may include:

(1) Converting the second content information into text information in a preset format;

(2) The text information is segmented to obtain at least one word, and a word vector corresponding to the at least one word is obtained.

Specifically, the following steps may be adopted:

a. detecting the fonts of the text information, and converting the fonts of the text information into preset fonts; for example, converting a traditional Chinese character into a simplified Chinese character;

b. word segmentation is carried out on the text information; for example, performing Chinese word segmentation in ansj (a word segmentation algorithm);

c. removing the predetermined characters after word segmentation to obtain text information in a preset format; for example, blank characters and punctuation marks are filtered to obtain text information in a preset format.

The term vectors corresponding to the terms respectively can be preset, and the term vectors corresponding to the terms respectively obtained by query word segmentation can be preset.

(3) And converting the word vector into a vector to be classified, classifying the vector to be classified, and determining the type of the second content information.

Specifically, converting the term vector into the vector to be classified may include:

a. acquiring an average value of numerical values of each adjacent preset dimension in the word vector;

b. and constructing a vector to be classified based on the obtained average value.

For example, the difference is (x ₁ ,x ₂ ,…,x _N-1 ,x _N ) Representing an N-gram (N-dimensional) vector corresponding to the text information, wherein the vector to be classified can select the average value of every two adjacent elements, namelyThe preset dimension may also be another number, such as 3, for example, an average of three elements per neighbor.

As shown in fig. 5, words 1 to N are respectively and correspondingly converted into (x) ₁ ,x ₂ ,…,x _N-1 ,x _N ) And selecting the average value of every two adjacent elements to obtainWill->And inputting the classification model to classify.

Specifically, by converting the term vector into the vector to be classified, the semantics of at least two adjacent terms can be combined, the combined terms possibly have relevance, the obtained semantics are more perfect, the accuracy of the classification result can be improved, and the calculated amount in the classification process is reduced.

As shown in fig. 6, the vector to be classified may be input into the classification model, and a corresponding classification result may be output, where the classification result may be a probability that the second content information corresponds to a specific type, or may be a probability that a plurality of types respectively correspond.

Specifically, if the probability is greater than or equal to a preset threshold, the second content information can be judged to be a specific category; if the correspondence probability is smaller than the preset threshold, it may be determined that the second content information does not belong to the specific category.

For example, the application account is public number, the second content information is article, the specific category is advertisement category, if an article X _i The probability of identifying an advertisement category based on the above classification model is greater than a threshold K (e.g., k=0.8), i.e., belonging to the advertisement category, if FasText (X) _i =advertisement)>K, the public number article Xi is considered as the advertisement article.

(4) And determining a second probability based on the type respectively corresponding to the at least one piece of second content information issued by the application account.

Specifically, the number of second content information of a specific category in all second content information published by the account number and the total number of second content information may be applied, and the second probability may be determined according to the number of second content information of the specific category and the total number of second content information.

Specifically, the second probability may be calculated using the following formula:

wherein BrandProb represents a second probability and M represents the amount of second content information published by the application account identified as a particular category; t represents the total amount of second content information published by the application account.

The above-described embodiments illustrate a process of determining the second probability, and a process of identifying a category of the application account based on the first probability and the second probability will be described below in connection with specific embodiments.

In one embodiment, determining the category of the account based on the first probability and the second probability identification of step S204 may include:

(1) Fusing the first probability and the second probability based on preset weights to obtain a fused value;

(2) And determining the category of the application account corresponding to the fusion value.

Specifically, a weighted sum manner may be adopted to determine the category of the application account, and a specific fusion calculation manner is as follows:

S＝αRel(X，Y)+(1-α)BrandProb (5)

wherein S is a fusion value; rel (X, Y) is the first probability; brandProb is the second probability; alpha is a preset weight.

As shown in fig. 7, in this embodiment, a first probability is determined according to a title and a target hot-search word of first content information issued by an application account; determining a second probability according to second content information published by the application account; and determining a fusion value according to the first probability and the second probability, and determining the category of the application account.

In another embodiment, the fusing the first probability and the second probability based on the preset weight in step S204 to obtain the fused value may include:

(1) And acquiring the registration time of the application account and determining a third probability corresponding to the registration time.

In a specific implementation process, determining the third probability corresponding to the registration time may include:

a. Determining the generation time of a target hot search word, and determining the registration time of an application account;

b. if the registration time is after the generation time, determining a time difference between the registration time and the generation time;

c. a third probability is determined based on the time difference.

Specifically, the third probability and the time difference are inversely related after the time of generating the target hot-search word, that is, the shorter the time difference between the registration time of the application account and the time of generating the target hot-search word, the greater the third probability.

(2) And fusing the first probability, the second probability and the third probability based on preset weights to obtain a fused value.

Specifically, the registration time may be fused, the third probability corresponding to the registration time may be determined, for example, according to the time difference between the registration time and the current hot search word, the third probability corresponding to the time difference may be queried, and then a weighted sum manner is adopted to determine the category of the application account, where a specific fusion calculation manner is as follows:

S＝αRel(X，Y)+βBrandProb+γt (6)

wherein S is a fusion value; rel (X, Y) is the first probability; brandProb is the second probability; t is a third probability; alpha, beta and gamma are all preset weights.

As shown in fig. 8, in this embodiment, a first probability is determined according to a title and a target hot-search word of first content information issued by an application account; determining a second probability according to second content information published by the application account; determining a third probability according to the registration time of the application account; and determining a fusion value according to the first probability, the second probability and the third probability, and determining the category of the application account.

Specifically, if the fusion value is greater than a preset value, the application account number can be judged to be of a specific category; if the fusion value is smaller than the preset value, the application account number can be judged not to belong to the specific category.

In the above embodiment, the third probability that the application account belongs to the specific category is determined by the registration time of the application account, and the category of the application account is identified according to the first probability, the second probability and the third probability, so that the accuracy of identifying the category of the application account can be further improved.

In order to better understand the above method for identifying an application account, as shown in fig. 9, an example of the method for identifying an application account of the present invention is described in detail below:

in one example, the identification method of the application account provided by the application comprises the following steps:

step S900, obtaining at least one target hot search word of an application program in a preset time period;

step S901, acquiring a title of at least one first content information issued by an application account to be identified through an application program;

step S902, determining semantic similarity between a title of the first content information and a target hot search word;

step S903, determining a first probability that the application account belongs to a specific category based on the determined semantic similarity;

Step S904, converting the second content information issued by the application account into text information in a preset format;

step S905, word segmentation is carried out on the text information to obtain a plurality of words, and word vectors corresponding to the words are obtained;

step S906, converting the word vector into a vector to be classified, classifying the vector to be classified, and determining the type of the second content information;

step S907, determining a second probability based on the type respectively corresponding to at least one piece of second content information issued by the application account;

step S908, fusing the first probability and the second probability based on preset weights to obtain a fused value;

step S909, judging whether the fusion value is larger than or equal to a preset value; if yes, the application account is of a specific category; if not, the application account does not belong to the specific category.

According to the identification method of the application account, the first probability that the application account belongs to the specific category is determined according to the target hot search word of the application program and the title of the first content information published by the application account, the second probability that the application account belongs to the specific category is determined according to the second content information published by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the relation between the title published by the application account and the target hot search word can be considered, the category corresponding to the second content information published by the application account can be considered, and the accuracy of the category identification of the application account can be improved.

In an embodiment of the present application, as shown in fig. 10, a device 100 for identifying an application account is provided, where the device 100 for identifying an application account may include: an acquisition module 1001, a first determination module 1002, a second determination module 1003, and an identification module 1004, wherein,

an obtaining module 1001, configured to obtain at least one target hot search word of an application program, and obtain a title of at least one first content information published by at least one application account through the application program;

a first determining module 1002, configured to determine, based on the target hot search word and the title of the first content information, a first probability that the application account belongs to a specific category;

A second determining module 1003, configured to obtain at least one piece of second content information published by the application account through the application program, and determine, based on the second content information, a second probability that the application account belongs to a specific category;

the identification module 1004 is configured to determine a category of the application account based on the first probability and the second probability.

The embodiment of the application provides a possible implementation manner, wherein the first content information comprises at least one of first graphic information and video; the second content information includes second teletext information.

In this embodiment of the present application, a possible implementation manner is provided, where the first determining module 1002 is specifically configured to, when determining, based on the target hot search word and the title of the first content information, a first probability that the application account belongs to a specific category:

determining semantic similarity between a title of the first content information and a target hot search word;

based on the semantic similarity, a first probability is determined.

In this embodiment of the present application, a possible implementation manner is provided, where the first determining module 1002 is specifically configured to, when determining a semantic similarity between a title of the first content information and the target hot search word:

Semantic similarity between the heading vector and the hotsearch term vector is determined.

In this embodiment, a possible implementation manner is provided, where the first determining module 1002 is specifically configured to, when converting a title of the first content information into a title vector:

splitting the title into at least one word;

if the number of the words obtained through splitting is larger than or equal to the preset number, converting the words with the preset number in front in sequence in the title into title vectors;

if the number of the words obtained by splitting is smaller than the preset number, repeating the sequence in the title in the last word until the number of the words is equal to the preset number, and converting the title after the words are repeated into a title vector.

In this embodiment, a possible implementation manner is provided, where the first determining module 1002 is specifically configured to, when determining the first probability based on the semantic similarity:

determining a first number of at least one first content information, and determining a second number of at least one target hot search word;

and normalizing the determined semantic similarity based on the maximum value in the first quantity and the second quantity to obtain a first probability.

In this embodiment of the present application, a possible implementation manner is provided, where the second determining module 1003 is specifically configured to, when determining, based on the second content information, a second probability that the application account belongs to a specific category:

and determining a second probability based on the type respectively corresponding to the at least one piece of second content information issued by the application account.

In this embodiment of the present application, a possible implementation manner is provided, where the second determining module 1003 is specifically configured to, when converting a word vector into a vector to be classified:

and constructing a vector to be classified based on the obtained average value.

In this embodiment of the present application, a possible implementation manner is provided, where the identifying module 1004 is specifically configured to, when determining a category of an application account based on the first probability and the second probability identification:

In this embodiment of the present application, a possible implementation manner is provided, where the identifying module 1004 is specifically configured to, when fusing the first probability and the second probability based on a preset weight to obtain a fused value:

In the embodiment of the present application, a possible implementation manner is provided, where the target hot search word is a hot search word of an application program in a preset time period, and the first content information is issued by the application account through the application program in the preset time period.

According to the identification device of the application account, the first probability that the application account belongs to the specific category is determined according to the target hot search word of the application program and the title of the first content information published by the application account, the second probability that the application account belongs to the specific category is determined according to the second content information published by the application account, and the category of the application account is identified by combining the first probability and the second probability, so that the relation between the title published by the application account and the target hot search word can be considered, the category corresponding to the second content information published by the application account can be considered, and the accuracy of the category identification of the application account can be improved.

The identification device for an application account of a picture in the embodiments of the present disclosure may perform the identification method for an application account of a picture provided in the embodiments of the present disclosure, and the implementation principle is similar, and actions performed by each module in the identification device for an application account of a picture in each embodiment of the present disclosure correspond to steps in the identification method for an application account of a picture in each embodiment of the present disclosure, and detailed functional descriptions of each module in the identification device for an application account of a picture may be specifically referred to descriptions in the identification method for an application account of a corresponding picture shown in the foregoing, which are not repeated herein.

Based on the same principles as the methods shown in the embodiments of the present disclosure, there is also provided in the embodiments of the present disclosure an electronic device that may include, but is not limited to: a processor and a memory; a memory for storing computer operating instructions; and the processor is used for executing the identification method of the application account shown in the embodiment by calling the computer operation instruction. Compared with the prior art, the identification method of the application account can improve the accuracy of the category identification of the application account.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 11, the electronic device 4000 shown in fig. 11 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes for executing the present application, and execution is controlled by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the identification method of the application account can improve the accuracy of the category identification of the application account.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of the module does not in any way constitute a limitation of the module itself, for example, the identification module may also be described as "module that identifies the category of the application account".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. An application account identification method is characterized by comprising the following steps:

acquiring at least one target hot search word of an application program, and acquiring a title of at least one first content information issued by at least one application account through the application program; the target hot search word is a hot search word of the application program in a preset time period, and the first content information is issued by the application account through the application program in the preset time period;

2. The method for identifying an application account according to claim 1, wherein the first content information includes at least one of first teletext information and video; the second content information includes second teletext information.

3. The method for identifying an application account according to claim 1, wherein the determining a first probability that the application account belongs to a specific category based on the target hot-search term and the title of the first content information includes:

the first probability is determined based on the semantic similarity.

4. The method for identifying an application account according to claim 3, wherein the determining the semantic similarity between the title of the first content information and the target hot search word includes:

5. The method for identifying an application account according to claim 4, wherein the converting the title of the first content information into a title vector includes:

splitting the title into at least one word;

6. The method for identifying an application account according to claim 3, wherein the determining the first probability based on the semantic similarity includes:

7. The method for identifying an application account according to claim 1, wherein the determining, based on the second content information, a second probability that the application account belongs to the specific category includes:

8. The method for identifying an application account according to claim 7, wherein the converting the term vector into a vector to be classified includes:

9. The method of identifying an application account according to any one of claims 1-8, wherein the determining a category of the application account based on the first probability and the second probability identification comprises:

10. The method for identifying an application account according to claim 9, wherein the fusing the first probability and the second probability based on a preset weight to obtain a fused value includes:

11. An application account identification device, comprising:

the system comprises an acquisition module, a search module and a search module, wherein the acquisition module is used for acquiring at least one target hot search word of an application program and acquiring a title of at least one first content information issued by at least one application account through the application program; the target hot search word is a hot search word of the application program in a preset time period, and the first content information is issued by the application account through the application program in the preset time period;

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of identifying an application account as claimed in any one of claims 1 to 10 when executing the program.

13. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method for identifying an application account according to any of claims 1-10.