CN111198992A - Identification method and identification device for mother and infant crowd, electronic equipment and storage medium - Google Patents

Identification method and identification device for mother and infant crowd, electronic equipment and storage medium Download PDF

Info

Publication number
CN111198992A
CN111198992A CN202010012718.8A CN202010012718A CN111198992A CN 111198992 A CN111198992 A CN 111198992A CN 202010012718 A CN202010012718 A CN 202010012718A CN 111198992 A CN111198992 A CN 111198992A
Authority
CN
China
Prior art keywords
users
user
network
mother
infant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010012718.8A
Other languages
Chinese (zh)
Inventor
陈晓薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing second hand Artificial Intelligence Technology Co.,Ltd.
Original Assignee
Admaster Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Admaster Technology Beijing Co ltd filed Critical Admaster Technology Beijing Co ltd
Priority to CN202010012718.8A priority Critical patent/CN111198992A/en
Publication of CN111198992A publication Critical patent/CN111198992A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a mother-infant crowd identification method, an identification device, electronic equipment and a storage medium, wherein the identification method acquires a plurality of network messages related to keyword characteristics from a social network according to the keyword characteristics of the mother-infant crowd at a stage to be identified, and determines a network user corresponding to each network message; determining a plurality of primary users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user; determining at least one candidate user except for abnormal users in the plurality of initially selected users, wherein the abnormal users comprise users conforming to the characteristics of water army and/or users located in a blacklist; if the number of times of the keyword features mentioned in the historical text information of the candidate user is larger than a preset number threshold, the candidate user is determined to be a mother-and-infant user in a stage to be identified, and the identification method of the mother-and-infant crowd can reduce the existence probability of non-mother-and-infant crowds such as water army, marketing personnel and KOL in the identified mother-and-infant crowd.

Description

Identification method and identification device for mother and infant crowd, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for identifying a group of mothers and infants, an electronic device, and a storage medium.
Background
Due to the social attributes of the maternal and infant groups, more and more brand owners begin to screen the maternal and infant groups from the sounding groups of the internet social platform, and study the interest preference, living habits, discussion topics and the like of the maternal and infant groups, so that the living forms and the psychological states of the maternal and infant groups are known according to study results, and the problem of actual demands of the maternal and infant groups is solved.
At present, according to the mass social content in the sounding crowd of the Internet social platform, the content of a certain keyword is extracted, determining the mother-infant population at each stage according to the content of the key words, further performing manual reading, classification judgment, content analysis and the like, however, since the method only starts from a single keyword to judge the mother-infant population, the mother-infant population is easy to be mixed with non-mother-infant populations such as water army, marketing personnel and Opinion Leader (KOL) of the industry, the opinion leaders in the industry are microblog owners, brand speakers and the like, so that the interest preference, living habits, discussion topics and the like of mother and infant crowds are not accurately researched, the brand owners are influenced to know the living forms and the psychological states of the mother and infant crowds in all stages, the actual demand problems of the mother and infant crowds in all stages are well solved, and the like.
Disclosure of Invention
In view of this, an object of the present application is to provide a method, an apparatus, an electronic device and a storage medium for identifying a group of mothers and infants, which can reduce the probability of existence of non-mothers and infants such as water army, marketer and KOL among the identified group of mothers and infants, and help to improve the accuracy of research on interest preference, living habits, discussion topics and the like of the group of mothers and infants.
In a first aspect, the present application provides a method for identifying a mother-infant population, the method comprising:
according to the keyword characteristics of mother and infant crowds at a stage to be identified, acquiring a plurality of network messages related to the keyword characteristics from a social network, and determining a network user corresponding to each network message;
determining a plurality of primary users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user;
determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of water army and/or users located in a blacklist;
and if the times of the keyword features mentioned in the historical text information of the candidate user are larger than a preset number threshold, determining that the candidate user is a mother-baby user in a stage to be identified.
Preferably, the keyword characteristics of the mother-infant population at the stage to be identified are determined by the following steps:
determining a target time period in which a mother-infant crowd in a stage to be identified is located, wherein the target time period is any one of 6 pre-divided mother-infant stages;
and extracting the time characteristics of the target time period, and taking the time characteristics as the keyword characteristics of the mother-infant crowd at the stage to be identified.
Preferably, a plurality of network texts related to the keyword features are obtained from the social network by the following steps:
detecting whether social content in the social network contains the keyword feature or a synonym of the keyword feature;
and determining the social content containing the keyword features and/or the synonyms of the keyword features as a plurality of network texts related to the keyword features.
Preferably, the determining a plurality of primary users of the plurality of network users based on the amount of the vermicelli and the amount of the text sent by each network user comprises:
acquiring vermicelli quantity and issue quantity of each network user and a preset vermicelli quantity threshold and a preset issue quantity threshold of the mother-infant crowd at the stage to be identified;
determining the user with the vermicelli amount larger than the preset vermicelli amount threshold value and/or the user with the text sending amount larger than the preset text sending amount threshold value as the user to be deleted;
and determining the remaining users which remove the users to be deleted from the plurality of network users as initial selection users.
Preferably, the abnormal users conforming to the water army characteristics are determined by the following steps:
acquiring various index information of each initially selected user;
and determining the scoring value of each primary user based on each item of index information and scoring judgment rules of each primary user, and determining abnormal users meeting the water army characteristics in the plurality of primary users based on the obtained scoring values and the water army judgment rules.
Preferably, the determining abnormal users according with the water army characteristics based on the scoring value and the water army decision rule comprises:
acquiring a water force score judgment threshold value and the score value of the primary user in the water force judgment rule;
and determining the users with the score values smaller than the water army score judgment threshold value in the plurality of initially selected users as abnormal users according with the water army characteristics.
Preferably, the index information items include: account nickname, personal introduction, vermicelli amount, concerned account number, registration time, microblog release amount and forwarding content.
In a second aspect, the present application provides an identification device for a mother-infant population, the identification device comprising:
the network user determining module is used for acquiring a plurality of network texts related to the keyword characteristics from a social network according to the keyword characteristics of mother and infant crowds at a stage to be identified and determining a network user corresponding to each network text;
the initial selection user determining module is used for determining a plurality of initial selection users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user;
the candidate user determining module is used for determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users in a blacklist;
and the mother-infant user determination module in the stage to be identified is used for determining the candidate user as the mother-infant user in the stage to be identified if the times of the keyword features mentioned in the historical text sending information of the candidate user are greater than a preset number threshold.
Preferably, the identification device further comprises a keyword feature determination module, and the keyword feature determination module is configured to determine the keyword features of the mother-infant population in the stage to be identified by:
determining a target time period in which a mother-infant crowd in a stage to be identified is located, wherein the target time period is any one of 6 pre-divided mother-infant stages;
and extracting the time characteristics of the target time period, and taking the time characteristics as the keyword characteristics of the mother-infant crowd at the stage to be identified.
Preferably, the network user determination module is configured to obtain a plurality of network messages related to the keyword feature from a social network by:
detecting whether social content in the social network contains the keyword feature or a synonym of the keyword feature;
and determining the social content containing the keyword features and/or the synonyms of the keyword features as a plurality of network texts related to the keyword features.
Preferably, when the primary user determination module is configured to determine a plurality of primary users among a plurality of network users based on the amount of silks and the amount of messages of each network user, the primary user determination module is specifically configured to:
acquiring vermicelli quantity and issue quantity of each network user and a preset vermicelli quantity threshold and a preset issue quantity threshold of the mother-infant crowd at the stage to be identified;
determining the user with the vermicelli amount larger than the preset vermicelli amount threshold value and/or the user with the text sending amount larger than the preset text sending amount threshold value as the user to be deleted;
and determining the remaining users which remove the users to be deleted from the plurality of network users as initial selection users.
Preferably, the candidate user determination module is configured to determine the abnormal user according with the water army feature by:
acquiring various index information of each initially selected user;
and determining the scoring value of each primary user based on each item of index information and scoring judgment rules of each primary user, and determining abnormal users meeting the water army characteristics in the plurality of primary users based on the obtained scoring values and the water army judgment rules.
Preferably, when the candidate user determination module is configured to determine, based on the obtained scoring value and the water force determination rule, an abnormal user that meets the water force feature among the plurality of initially selected users, the candidate user determination module is specifically configured to:
acquiring a water force score judgment threshold value and the score value of the primary user in the water force judgment rule;
and determining the users with the score values smaller than the water army score judgment threshold value in the plurality of initially selected users as abnormal users according with the water army characteristics.
Preferably, the index information items include: account nickname, personal introduction, vermicelli amount, concerned account number, registration time, microblog release amount and forwarding content.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method for identifying a mother-infant group as described above.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the identification method for mother-infant group as described above.
The embodiment of the application provides a mother-infant crowd identification method, an identification device, electronic equipment and a storage medium, wherein a plurality of network messages related to keyword characteristics are obtained from a social network according to the keyword characteristics of the mother-infant crowd at a stage to be identified, a network user corresponding to each network message is determined, and then a plurality of primary users in the plurality of network users are determined based on the amount of silking and the amount of messages sent by each network user; further determining at least one candidate user except for abnormal users in the plurality of initially selected users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users located in a blacklist; finally, if the number of times of the keyword features mentioned in the historical text information of the candidate user is larger than a preset number threshold, the candidate user is determined to be a mother-and-infant user in a stage to be identified, the identification method of the mother-and-infant population can reduce the existence probability of non-mother-and-infant populations such as water army, marketers and KOL in the identified mother-and-infant population, and is beneficial to improving the accuracy of researches on interest preference, living habits, discussion topics and the like of the mother-and-infant population, and therefore the influence of the non-mother-infant population on the analysis result of the mother-and-infant population can be reduced when the interest preference, living habits, discussion topics and the like of the mother-and-infant.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart of a method for identifying a mother-infant population according to an embodiment of the present disclosure;
fig. 2 is a flowchart of another identification method for mother-infant population according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an identification apparatus for mother and infant people according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals: 300-a recognition device; 310-network user determination module; 320-primary user determination module; 330-candidate user determination module; 340-a mother and infant user determination module in a stage to be identified; 400-an electronic device; 410-a processor; 420-a memory; 430-bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
In order to enable those skilled in the art to use the present disclosure, the following embodiments are given in conjunction with a specific application scenario "identify a mother-and-infant group who sends a message on a microblog". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and application scenarios, such as identifying a mother-infant group that transmits text on a small red book, forum, or question-and-answer platform, without departing from the spirit and scope of the application. Although the present application is described primarily in terms of how to identify a population of mothers and infants who have sent text on microblogs, it should be understood that this is but one exemplary embodiment.
In the prior art, according to massive social content in sounding crowd of an internet social platform, content of a certain keyword is extracted and mentioned, all stages of mother and infant crowd are determined according to the content of the keyword, and then manual reading, classification judgment, content analysis and the like are performed. Based on this, the embodiment of the application provides a method for identifying maternal and infant groups, which can reduce the existence probability of non-maternal and infant groups such as water army, marketers and KOL in the identified maternal and infant groups and is beneficial to improving the accuracy of researches on interest preference, living habits, discussion topics and the like of the maternal and infant groups.
Referring to fig. 1, fig. 1 is a flowchart of a method for identifying a mother-infant group according to an embodiment of the present disclosure, as shown in fig. 1, the embodiment of the present disclosure provides a method for identifying a mother-infant group, the method includes:
s110, according to the keyword characteristics of the mother-infant crowd at the stage to be identified, a plurality of network texts related to the keyword characteristics are obtained from the social network, and the network user corresponding to each network text is determined.
In the embodiment of the application, a plurality of network messages corresponding to the keyword features are found through the keyword features of mother-infant crowds in the stage to be identified in the social network, and a network user corresponding to the network message is found through each network message.
In the embodiment of the present application, as an optional embodiment, a plurality of network texts related to the keyword feature are obtained from a social network by the following steps:
detecting whether social content in the social network contains the keyword feature or a synonym of the keyword feature.
Specifically, social content in the social network is analyzed, wherein the social content may be from a microblog platform, a small red book, a forum, or a question and answer platform; and then judging whether the social content contains the keyword characteristics and the similar meaning words or the synonyms related to the keyword characteristics.
And determining the social content containing the keyword features and/or the synonyms of the keyword features as a plurality of network texts related to the keyword features.
In the embodiment of the application, a method for obtaining a plurality of network texts related to keyword features is provided, and by reading obtained social content and finding keyword features or words related to the keyword features, such as synonyms, near-synonyms and the like, from the social content, the network texts containing the keyword features or the words related to the keyword features are determined to be the network texts determined by preliminary screening in the embodiment of the application.
And S120, determining a plurality of primary users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user.
In the embodiment of the application, the powder amount and the text amount of each network user within a period of time are detected, and non-mother and infant crowds can be further screened from the network users through the abnormal powder amount and text amount. Because the group of people such as KOL has a certain fan-load basis as the leaders of the industry, brand merchants generally utilize the group of people to help them publicize products; therefore, if the population is mixed in the maternal and infant population, the interference on the real requirement of the later analysis of the maternal and infant population is very likely to be generated, and the population needs to be filtered out from the network users firstly.
In this embodiment, as an optional embodiment, the determining, based on the amount of vermicelli and the amount of text sent by each network user, a plurality of primary users in the plurality of network users includes:
and acquiring the vermicelli quantity and the issue quantity of each network user and the preset vermicelli quantity threshold and the preset issue quantity threshold of the mother-infant crowd at the stage to be identified.
In the embodiment of the application, the preset fan amount threshold and the preset text sending amount threshold are thresholds determined by analyzing a large amount of data, and the thresholds are key points for judging whether a network user is a KOL or a microblog owner.
And determining the user with the vermicelli amount larger than the preset vermicelli amount threshold value and/or the user with the text volume larger than the preset text volume threshold value as the user to be deleted.
Specifically, the embodiments of the present application include three cases: users with vermicelli amount larger than a preset vermicelli amount threshold value; the user with the message sending quantity larger than a preset message sending quantity threshold value; the user with the vermicelli amount larger than the preset vermicelli amount threshold value and the user with the issue text amount larger than the preset issue text amount threshold value; all three users are to-be-deleted users.
And determining the remaining users which remove the users to be deleted from the plurality of network users as initial selection users.
In the embodiment of the application, the user to be deleted in the network users is determined according to the relation between the vermicelli amount and the preset vermicelli amount threshold value and the relation between the text sending amount and the preset text sending amount threshold value, the user to be deleted is removed from the network users, and accordingly the primary selected user is obtained, and the KOL or the microblog bloggers with more vermicelli amount are filtered out of the primary selected user.
S130, determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users located in a blacklist.
In the embodiment of the application, the primary users are continuously screened, and users meeting the water army characteristics and/or users in the blacklist are removed from the primary users. The network water force is a group of employed network writers which issue specific information aiming at specific contents in a network, the group of people is usually active in social network platforms such as electronic commerce websites, forums, microblogs and the like, the group of people can influence common users by disguising common netists or consumers and issuing, replying, spreading and the like, and if the group of mother and infant is mixed with the network water force, the analysis of the group of mother and infant can be influenced. In the network, a search engine can record pages with a lot of indexes, but the content of some pages is poor, so that the pages can be recorded in a blacklist and can not be recorded or indexed in the later process, and further, network users with the text contents belonging to the blacklist are not included in the mother-infant group required by the embodiment of the application, and even if the blacklist has real mother-infant group, the real mother-infant group can not be considered.
In the embodiment of the present application, as an optional embodiment, the abnormal user meeting the water force feature is determined through the following steps:
and acquiring various index information of each initially selected user.
Specifically, each item of index information includes: account nickname, personal introduction, vermicelli amount, concerned account number, registration time, microblog release amount and forwarding content. It should be noted that the index information is not limited to the above-mentioned contents, but the embodiment of the present application only exemplifies the above-mentioned contents to determine the user of the water army.
And determining the scoring value of each primary user based on each item of index information and scoring judgment rules of each primary user, and determining abnormal users meeting the water army characteristics in the plurality of primary users based on the obtained scoring values and the water army judgment rules.
In the embodiment of the application, the primary users are scored based on the scoring judgment rule and each item of index information of each primary user, for example: the total score of the account nickname is 0.4, the total score of the personal introduction is 0.6, the total score of the fan amount is 0.6, the total score of the concerned account number is 0.4, the total score of the registration time is 0.4, the total score of the microblog release amount is 0.8, the total score of the forwarding content is 0.8 and the full score is 4, and each primary user is scored according to the real information of each primary user and by contrasting scoring judgment rules.
In this embodiment, as an optional embodiment, the determining, based on the score value and the water force determination rule, an abnormal user that meets the water force feature includes:
acquiring a water force score judgment threshold value and the score value of the primary user in the water force judgment rule;
and determining the users with the score values smaller than the water army score judgment threshold value in the plurality of initially selected users as abnormal users according with the water army characteristics.
In the embodiment of the application, the total score of the scores is 4, the water army score judgment threshold value is 3, and if the comprehensive score is less than 3, the primary user is determined to be an abnormal user according with the water army characteristics.
S140, if the times of the keyword features mentioned in the historical text information of the candidate user are larger than a preset number threshold, determining that the candidate user is a mother-baby user at a stage to be identified.
In the embodiment of the application, the preset number threshold is 2 times, the times of the keyword features mentioned in the historical text sending information of the candidate user are detected, if the times are more than 2 times, the candidate user is proved to be the mother-and-baby user in the stage to be identified, and further whether the screened candidate user is the mother-and-baby user in the stage to be identified can be further verified through the historical text sending information of the candidate user.
The existing scheme cannot acquire the content of the homelife posting of the pregnant mother at the stage irrelevant to the content of the pregnant mother, so that the study of the pregnant mother at each stage cannot be carried out from the perspective of more comprehensive clothes, eating, walking, entertainment, life and the like. In addition, because the filtering of the water force and non-mom is not performed well, the data washing is not thorough enough, and contents such as only the matters related to pregnancy mentioned by chance are mixed. The scheme that this application provided can be accurate select mother and infant crowd, and then can follow the multidimensional different stages of carrying on of full aspect mother's life form and psychological analysis, and better help brand owner knows mom's mind to know and track mom's complete experience better, help to solve consumer pain point.
The embodiment of the application provides a mother-infant crowd identification method, which comprises the steps of firstly obtaining a plurality of network messages related to keyword characteristics from a social network according to the keyword characteristics of the mother-infant crowd at a stage to be identified, determining a network user corresponding to each network message, and then determining a plurality of primary users in the plurality of network users based on the amount of silks and the message amount of each network user; further determining at least one candidate user except for abnormal users in the plurality of initially selected users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users located in a blacklist; finally, if the number of times of the keyword features mentioned in the historical text information of the candidate user is larger than a preset number threshold, the candidate user is determined to be a mother-and-infant user in a stage to be identified, the identification method of the mother-and-infant population can reduce the existence probability of non-mother-and-infant populations such as water army, marketers and KOL in the identified mother-and-infant population, and is beneficial to improving the accuracy of researches on interest preference, living habits, discussion topics and the like of the mother-and-infant population, and therefore the influence of the non-mother-infant population on the analysis result of the mother-and-infant population can be reduced when the interest preference, living habits, discussion topics and the like of the mother-and-infant.
Referring to fig. 2, fig. 2 is a flowchart of another identification method for mother-infant population according to an embodiment of the present application, and as shown in fig. 2, the embodiment of the present application provides another identification method for mother-infant population, where the identification method includes:
s210, determining a target time period in which the maternal and infant population in the stage to be identified is located, wherein the target time period is any one of 6 pre-divided maternal and infant stages.
The mother-infant group is mothers in different stages, and is divided into 6 stages from pregnancy to baby growth until the baby grows to 3 years old; because at each stage, there are some variations in the products used by mothers and babies, the brand owners need to do a detailed analysis and mining of each segment; furthermore, the embodiment of the application divides the mother-infant population into 6 stages, namely a first stage T1: 1 to 12 weeks gestation; second stage T2: 13 to 28 weeks gestation; third stage T3: 29 to 40 weeks gestation; fourth stage S1: baby 0 to 6 months; fifth stage S2: baby 7-12 months; sixth stage S3: baby 13 to 36 months.
S220, extracting the time characteristics of the target time period, and taking the time characteristics as the keyword characteristics of the mother-infant crowd at the stage to be identified.
In the embodiment of the application, the stage of the mother-infant crowd at the stage to be identified is determined, the time characteristics of the corresponding stage are extracted, and the time characteristics are used as the keyword characteristics of the mother-infant crowd at the stage to be identified.
S230, according to the keyword characteristics of the mother-infant crowd at the stage to be identified, a plurality of network texts related to the keyword characteristics are obtained from the social network, and the network user corresponding to each network text is determined.
S240, determining a plurality of primary users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user.
And S250, determining at least one candidate user except for abnormal users in the plurality of initially selected users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users located in a blacklist.
S260, if the times of the keyword features mentioned in the historical text information of the candidate user are larger than a preset number threshold, determining that the candidate user is a mother-baby user at a stage to be identified.
The descriptions of S230 to S260 may refer to the descriptions of S110 to S140, and the same technical effects can be achieved, which are not described again.
Based on a microblog platform, aiming at identifying the mother-infant crowd from the network users who send the messages on the microblog platform, a set of complete solution is designed: in the embodiment of the application, the second stage T2 is taken as an example, how to find out the T2 population (mothers who are 13-28 weeks pregnant), firstly, screening the network users who mention "25 weeks pregnant", or "7 months pregnant" or "168 days pregnant"; secondly, the number of vermicelli is required to be less than 10 ten thousand, and the vermicelli is required to be 'pregnant for 25 weeks' when the month comes out, or 'pregnant for 7 months', and then filtered once through a dehydration model and a blacklist; and after filtering, backtracking the historical text contents of each network user, analyzing and checking whether the average text contents of the network users have 2 mentioned mother-infant categories every month, and if so, judging the network users meeting the conditions as T2 mother-infant groups.
The identification method of the maternal and infant population, provided by the embodiment of the application, can reduce the existence probability of non-maternal and infant populations such as water army, marketing personnel and KOL in the identified maternal and infant population, and is beneficial to improving the accuracy of researches such as interest preference, living habits and discussion topics of the maternal and infant population, so that the influence of the non-maternal and infant population on the analysis result of the maternal and infant population can be reduced when the interest preference, the living habits, the discussion topics and the like of the maternal and infant population are analyzed.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an identification apparatus for mother and infant people according to an embodiment of the present application, and as shown in fig. 3, the identification apparatus 300 includes:
the network user determining module 310 is configured to obtain, according to keyword features of mother-infant groups at a stage to be identified, a plurality of network messages related to the keyword features from a social network, and determine a network user corresponding to each network message;
a primary user determination module 320 for determining a plurality of primary users among the plurality of network users based on the amount of the silks and the amount of the messages of each network user;
a candidate user determination module 330, configured to determine at least one candidate user of the plurality of primary users, except for an abnormal user, where the abnormal user includes a user meeting a water army feature and/or a user located in a blacklist;
the to-be-identified stage maternal and infant user determination module 340 is configured to determine that the candidate user is the to-be-identified stage maternal and infant user if the number of times that the keyword features are mentioned in the historical text sending information of the candidate user is greater than a preset number threshold.
In a preferred embodiment of the present application, the recognition apparatus 300 further includes a keyword feature determination module, which is configured to determine the keyword features of the mother-infant population to be recognized through the following steps:
determining a target time period in which a mother-infant crowd in a stage to be identified is located, wherein the target time period is any one of 6 pre-divided mother-infant stages;
and extracting the time characteristics of the target time period, and taking the time characteristics as the keyword characteristics of the mother-infant crowd at the stage to be identified.
In some embodiments of the present application, the network user determination module 310 is configured to obtain a plurality of network messages related to the keyword feature from a social network by:
detecting whether social content in the social network contains the keyword feature or a synonym of the keyword feature;
and determining the social content containing the keyword features and/or the synonyms of the keyword features as a plurality of network texts related to the keyword features.
In the above embodiment, when the preliminary selection user determining module 320 is configured to determine a plurality of preliminary selection users among a plurality of network users based on the amount of silkiness and the amount of text sent by each network user, the preliminary selection user determining module 320 is specifically configured to:
acquiring vermicelli quantity and issue quantity of each network user and a preset vermicelli quantity threshold and a preset issue quantity threshold of the mother-infant crowd at the stage to be identified;
determining the user with the amount of the vermicelli larger than the preset text sending amount threshold value and/or the user with the text sending amount larger than the preset text sending amount threshold value as the user to be deleted;
and determining the remaining users which remove the users to be deleted from the plurality of network users as initial selection users.
In some embodiments of the present application, the candidate user determination module 330 is configured to determine abnormal users that meet the water force characteristics by:
acquiring various index information of each initially selected user;
and determining the scoring value of each primary user based on each item of index information and scoring judgment rules of each primary user, and determining abnormal users meeting the water army characteristics in the plurality of primary users based on the obtained scoring values and the water army judgment rules.
In the above embodiment, when the candidate user determination module 330 is configured to determine, based on the obtained score value and the water force determination rule, an abnormal user that meets the water force feature from among the plurality of initially selected users, the candidate user determination module 330 is specifically configured to:
acquiring a water force score judgment threshold value and the score value of the primary user in the water force judgment rule;
and determining the users with the score values smaller than the water army score judgment threshold value in the plurality of initially selected users as abnormal users according with the water army characteristics.
In some embodiments of the present application, the index information includes: account nickname, personal introduction, vermicelli amount, concerned account number, registration time, microblog release amount and forwarding content.
The embodiment of the application provides an identification device for maternal and infant crowds, wherein a network user determination module is used for acquiring a plurality of network messages related to keyword characteristics from a social network according to the keyword characteristics of the maternal and infant crowds at a stage to be identified, determining a network user corresponding to each network message, and then determining a plurality of primary users in the plurality of network users based on the amount of silking and the amount of messages of each network user; the candidate user determining module is further used for determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users located in a blacklist; finally, the to-be-identified stage maternal and infant user determination module is used for determining the candidate user as the to-be-identified stage maternal and infant user if the number of times of the keyword features mentioned in the historical text information of the candidate user is larger than a preset number threshold, and the identification method of the maternal and infant population can enable few non-maternal and infant populations such as the navy, the marketer and the KOL to be mixed in the maternal and infant population, so that the maternal and infant population can be accurately identified and analyzed, and therefore, the influence of the non-maternal and infant population on the analysis result of the maternal and infant population can be reduced when the interest preference, the living habits, the discussion topics and the like of the maternal and infant population are analyzed.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.
The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 runs, the processor 410 and the memory 420 communicate through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the method for identifying a mother-infant group in the method embodiments shown in fig. 1 and fig. 2 may be performed.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying a mother-infant group in the method embodiments shown in fig. 1 and fig. 2 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A mother-infant population identification method is characterized by comprising the following steps:
according to the keyword characteristics of mother and infant crowds at a stage to be identified, acquiring a plurality of network messages related to the keyword characteristics from a social network, and determining a network user corresponding to each network message;
determining a plurality of primary users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user;
determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of water army and/or users located in a blacklist;
and if the times of the keyword features mentioned in the historical text information of the candidate user are larger than a preset number threshold, determining that the candidate user is a mother-baby user in a stage to be identified.
2. The identification method according to claim 1, characterized in that the keyword features of the mother-infant population at the stage to be identified are determined by the following steps:
determining a target time period in which a mother-infant crowd in a stage to be identified is located, wherein the target time period is any one of 6 pre-divided mother-infant stages;
and extracting the time characteristics of the target time period, and taking the time characteristics as the keyword characteristics of the mother-infant crowd at the stage to be identified.
3. The method of claim 1, wherein the plurality of network messages related to the keyword feature are obtained from a social network by:
detecting whether social content in the social network contains the keyword feature or a synonym of the keyword feature;
and determining the social content containing the keyword features and/or the synonyms of the keyword features as a plurality of network texts related to the keyword features.
4. The method of claim 1, wherein determining the plurality of primary users from the plurality of network users based on the amount of the vermicelli and the amount of the text sent by each network user comprises:
acquiring vermicelli quantity and issue quantity of each network user and a preset vermicelli quantity threshold and a preset issue quantity threshold of the mother-infant crowd at the stage to be identified;
determining the user with the vermicelli amount larger than the preset vermicelli amount threshold value and/or the user with the text sending amount larger than the preset text sending amount threshold value as the user to be deleted;
and determining the remaining users which remove the users to be deleted from the plurality of network users as initial selection users.
5. The identification method according to claim 1, wherein the abnormal users conforming to the water force characteristics are determined by:
acquiring various index information of each initially selected user;
and determining the scoring value of each primary user based on each item of index information and scoring judgment rules of each primary user, and determining abnormal users meeting the water army characteristics in the plurality of primary users based on the obtained scoring values and the water army judgment rules.
6. The identification method according to claim 5, wherein the determining abnormal users meeting the water army characteristics from the plurality of primary users based on the obtained scoring values and the water army determination rules comprises:
acquiring a water force score judgment threshold value and the score value of the primary user in the water force judgment rule;
and determining the users with the score values smaller than the water army score judgment threshold value in the plurality of initially selected users as abnormal users according with the water army characteristics.
7. The identification method according to claim 5, wherein the index information items include: account nickname, personal introduction, vermicelli amount, concerned account number, registration time, microblog release amount and forwarding content.
8. An identification device for a mother-infant population, the identification device comprising:
the network user determining module is used for acquiring a plurality of network texts related to the keyword characteristics from a social network according to the keyword characteristics of mother and infant crowds at a stage to be identified and determining a network user corresponding to each network text;
the initial selection user determining module is used for determining a plurality of initial selection users in the plurality of network users based on the vermicelli amount and the text sending amount of each network user;
the candidate user determining module is used for determining at least one candidate user except for abnormal users in the plurality of primary users, wherein the abnormal users comprise users conforming to the characteristics of the water army and/or users in a blacklist;
and the mother-infant user determination module in the stage to be identified is used for determining the candidate user as the mother-infant user in the stage to be identified if the times of the keyword features mentioned in the historical text sending information of the candidate user are greater than a preset number threshold.
9. An electronic device, comprising: processor, memory and bus, the memory storing machine readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine readable instructions when executed by the processor performing the steps of the method of identifying a maternal and infant population according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of identifying a population of mothers and infants according to any one of claims 1 to 7.
CN202010012718.8A 2020-01-07 2020-01-07 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium Pending CN111198992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010012718.8A CN111198992A (en) 2020-01-07 2020-01-07 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010012718.8A CN111198992A (en) 2020-01-07 2020-01-07 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111198992A true CN111198992A (en) 2020-05-26

Family

ID=70745484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010012718.8A Pending CN111198992A (en) 2020-01-07 2020-01-07 Identification method and identification device for mother and infant crowd, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111198992A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685204A (en) * 2020-12-29 2021-04-20 北京中科闻歌科技股份有限公司 Social robot detection method and device based on anomaly detection
CN113240479A (en) * 2021-07-13 2021-08-10 武汉卓尔数字传媒科技有限公司 User analysis method and device and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device
US20170147696A1 (en) * 2015-11-25 2017-05-25 Facebook, Inc. Text-to-Media Indexes on Online Social Networks
CN106878242A (en) * 2016-06-02 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for determining user identity classification
CN106940732A (en) * 2016-05-30 2017-07-11 国家计算机网络与信息安全管理中心 A kind of doubtful waterborne troops towards microblogging finds method
CN107633077A (en) * 2017-09-25 2018-01-26 南京安链数据科技有限公司 A kind of system and method for more strategy cleaning social media text datas
CN107659647A (en) * 2017-09-26 2018-02-02 精硕科技(北京)股份有限公司 The recognition methods of water note and device
CN107656918A (en) * 2017-05-10 2018-02-02 平安科技(深圳)有限公司 Obtain the method and device of targeted customer
CN109063127A (en) * 2018-08-02 2018-12-21 深圳市京华信息技术有限公司 A kind of searching method, device, server and storage medium
CN110598091A (en) * 2019-08-09 2019-12-20 阿里巴巴集团控股有限公司 User tag mining method, device, server and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198161A (en) * 2013-04-28 2013-07-10 中国科学院计算技术研究所 Microblog ghostwriter identifying method and device
US20170147696A1 (en) * 2015-11-25 2017-05-25 Facebook, Inc. Text-to-Media Indexes on Online Social Networks
CN106940732A (en) * 2016-05-30 2017-07-11 国家计算机网络与信息安全管理中心 A kind of doubtful waterborne troops towards microblogging finds method
CN106878242A (en) * 2016-06-02 2017-06-20 阿里巴巴集团控股有限公司 A kind of method and device for determining user identity classification
CN107656918A (en) * 2017-05-10 2018-02-02 平安科技(深圳)有限公司 Obtain the method and device of targeted customer
CN107633077A (en) * 2017-09-25 2018-01-26 南京安链数据科技有限公司 A kind of system and method for more strategy cleaning social media text datas
CN107659647A (en) * 2017-09-26 2018-02-02 精硕科技(北京)股份有限公司 The recognition methods of water note and device
CN109063127A (en) * 2018-08-02 2018-12-21 深圳市京华信息技术有限公司 A kind of searching method, device, server and storage medium
CN110598091A (en) * 2019-08-09 2019-12-20 阿里巴巴集团控股有限公司 User tag mining method, device, server and readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685204A (en) * 2020-12-29 2021-04-20 北京中科闻歌科技股份有限公司 Social robot detection method and device based on anomaly detection
CN112685204B (en) * 2020-12-29 2024-03-05 北京中科闻歌科技股份有限公司 Social robot detection method and device based on anomaly detection
CN113240479A (en) * 2021-07-13 2021-08-10 武汉卓尔数字传媒科技有限公司 User analysis method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Barbado et al. A framework for fake review detection in online consumer electronics retailers
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
Heydari et al. Detection of fake opinions using time series
CN107291780B (en) User comment information display method and device
CN104281622B (en) Information recommendation method and device in a kind of social media
CN108304526B (en) Data processing method and device and server
CN105335509B (en) A kind of method for recommending action message, device and server
CN109033200B (en) Event extraction method, device, equipment and computer readable medium
CN106294425B (en) The automatic image-text method of abstracting and system of commodity network of relation article
US9245035B2 (en) Information processing system, information processing method, program, and non-transitory information storage medium
CN105612515B (en) Contradiction shows collection device and recording medium
CN106168953B (en) Bo-Weak-relationship social network-oriented blog recommendation method
CN103563332A (en) Social media identity discovery and mapping
WO2017157090A1 (en) Similarity mining method and device
CN111198992A (en) Identification method and identification device for mother and infant crowd, electronic equipment and storage medium
CN113934941A (en) User recommendation system and method based on multi-dimensional information
CN112989824A (en) Information pushing method and device, electronic equipment and storage medium
CN114223012A (en) Push object determination method and device, terminal equipment and storage medium
EP2824593A1 (en) Method for enriching a multimedia content, and corresponding device
KR101811638B1 (en) Method of Influence Measurement based on Sentiment Analysis of SNS Users
JP2008146293A (en) Evaluation system, method and program for browsing target information
CN113704623A (en) Data recommendation method, device, equipment and storage medium
Agrawal et al. Analysis of text mining techniques over public pages of Facebook
Yang et al. Characteristics and prevalence of fake social media profiles with AI-generated faces
US20210019553A1 (en) Information processing apparatus, control method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201224

Address after: A108, 1 / F, curling hall, winter training center, 68 Shijingshan Road, Shijingshan District, Beijing 100041

Applicant after: Beijing second hand Artificial Intelligence Technology Co.,Ltd.

Address before: Room 9014, 9 / F, building 3, yard 30, Shixing street, Shijingshan District, Beijing

Applicant before: ADMASTER TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20200526