WO2018210045A1 - 一种原生对象的识别方法和装置 - Google Patents

一种原生对象的识别方法和装置 Download PDF

Info

Publication number
WO2018210045A1
WO2018210045A1 PCT/CN2018/079243 CN2018079243W WO2018210045A1 WO 2018210045 A1 WO2018210045 A1 WO 2018210045A1 CN 2018079243 W CN2018079243 W CN 2018079243W WO 2018210045 A1 WO2018210045 A1 WO 2018210045A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
topic
application
application account
objects
Prior art date
Application number
PCT/CN2018/079243
Other languages
English (en)
French (fr)
Inventor
康战辉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018210045A1 publication Critical patent/WO2018210045A1/zh
Priority to US16/388,083 priority Critical patent/US11250077B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method and apparatus for identifying a native object.
  • the original article identification method is determined as a native article of a certain topic according to the WeChat friend circle forwarding amount or the largest sharing article.
  • the identification method of the native article provided in the prior art needs to satisfy the following assumptions in order to correctly identify the original article, that is, according to the prior art, it is assumed that the topic is the explosive forwarding of the original article itself, and then widely spread. However, in the analysis of actual problems, it is found that this assumption is often not necessarily true. If a topic appears not through the original article itself, but through some other public number, especially a large V public number reprinted this article, and then spread on social platforms (such as WeChat platform), At this time, according to the prior art, the article forwarded by the large V public number is misidentified as a native article, thereby causing the possibility that the recognition method of the original article in the prior art has a recognition error, and the recognition accuracy of the original article is higher. low.
  • Embodiments of the present invention provide a method and apparatus for identifying a native object, which are used to improve the recognition accuracy of a native object.
  • the embodiment of the present invention provides the following technical solutions:
  • an embodiment of the present invention provides a method for identifying a native object, including:
  • the second aspect of the account account account account account account further provides a device for identifying a native object, including:
  • a topic obtaining module configured to acquire a first topic to be processed from the social platform, where the first topic has a first topic identifier
  • An object search module configured to search, from the social platform, M objects including the first topic according to the first topic identifier, where the M is a positive integer;
  • An account statistics module is configured to collect an application account that appears on each of the M objects, so that an application account whose frequency is in the top N is obtained, where the N is a positive integer;
  • the account filtering module is configured to identify the first application account from the top N application accounts according to the preset account filtering rule, and determine the object published by the first application account as a native object.
  • a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method described in the above aspects.
  • a fifth aspect of the present application provides an apparatus for identifying a native object, which may include: a memory, a transceiver, a processor, and a bus system;
  • the memory is used to store a program
  • the processor is configured to execute a program in the memory, including the following steps:
  • the bus system is configured to connect the memory and the processor to cause the memory and the processor to communicate.
  • the first topic to be processed is first acquired from the social platform, the first topic has a first topic identifier, and secondly, the M objects including the first topic are searched from the social platform according to the first topic identifier.
  • M is a positive integer.
  • the application account appearing on each object in the M objects is counted, so that the application accounts whose frequency is in the top N are obtained, N is a positive integer, and finally the frequency is generated according to the preset account filtering rule.
  • the first application account is identified in the top N application accounts, and the object published by the first application account is determined as a native object.
  • the embodiment of the present invention may search for M objects from the social platform according to the first topic identifier, and the M objects may be used to count the frequency of occurrence of each application account, and perform statistics by using an application account that appears on each object as a keyword. Therefore, the first application account with the highest frequency of occurrence can be accurately calculated, and the first application account is selected by the account filtering rule.
  • the object published by the first application account is the original object identified in the embodiment of the present invention.
  • the N application accounts are counted from the M objects based on the application account as a keyword, and the N application accounts are potential application accounts for publishing the original objects, and then the first application account is selected.
  • the N application accounts selected in the embodiment of the present invention are potential application accounts for publishing the original objects, regardless of whether the occurrence of a certain topic is obtained through the original object itself, and does not affect the pair. Accurate identification of native objects.
  • FIG. 1 is a schematic block diagram of a method for identifying a native object according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of an implementation scenario of searching for M objects from a social platform according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an implementation scenario of searching for an application account from an object according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a device for identifying a native object according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an account statistics module according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a method for identifying a native object according to an embodiment of the present invention.
  • Embodiments of the present invention provide a method and apparatus for identifying a native object, which are used to improve the recognition accuracy of a native object.
  • the social platform may be a platform for the user to interact based on the social application.
  • the social platform may be an instant messaging tool such as a WeChat platform or a QQ platform.
  • the interactive action of browsing the circle of friends, publishing the article, forwarding the article, reviewing the article, and the like may be performed, and the social platform stores a large amount of historical data of the user.
  • Each user has an app account registered on the social platform.
  • the WeChat public account is an application account applied by the developer or merchant on the WeChat public platform. The account is interoperable with the QQ account. Through the public number, the merchant can implement the text, picture, voice and video of the specific group on the WeChat platform.
  • the all-round communication and interaction form a mainstream online and offline WeChat interaction method.
  • the application account can also be a cross-platform account, such as an A application account (A server) to log in to the B social platform (B server), and then the first topic to be processed can also be obtained in the B social platform.
  • a server A application account
  • B server B social platform
  • the application account in the embodiment of the present invention is not limited to an account registered on the platform, and may also be an account obtained in different applications or otherwise.
  • the first topic to be processed is first obtained from the social platform, wherein the topic refers to a general term of an event on the Internet, such as “North Korea Nuclear Test”, “Luo Yixiao, You Stand for Me”, etc. It is a specific topic.
  • Each topic can be assigned a topic identifier on the social platform, and the topic identifier is a unique identifier of the topic, by which the corresponding topic can be determined.
  • the first topic identifier is assigned to the first topic.
  • step 101 acquires a first topic to be processed from the social platform, including:
  • A1 Clustering multiple topics spread on the social platform, and determining the first topic according to the clustering result.
  • the determination of the first topic can be implemented by a clustering algorithm.
  • the text clustering algorithm is adopted, according to the following clustering hypothesis: the similarity of similar documents is relatively large, and the similarity of similar documents is small, and the text information can be effectively organized, summarized and navigated.
  • the clustering algorithm performs clustering processing from multiple topic texts on the social platform, and performs redundancy elimination, information fusion, text generation and the like on the same topic document to generate a concise summary document.
  • the text clustering algorithm provided by the embodiment of the invention can be divided into a partitioned clustering algorithm (for example, kmeans) and a hierarchical clustering algorithm. Taking the hierarchical clustering algorithm as an example, it is mainly divided into a bottom-up cohesive and a top-down split.
  • a split clustering algorithm ie, DIANA algorithm
  • DIANA algorithm is taken as an example. First, all objects are initialized into a cluster, and then the clusters are classified according to some principles (such as the nearest European maximum distance). Until the number of clusters specified by the user is reached or the distance between the two clusters exceeds a certain threshold.
  • the DIANA algorithm uses the following two definitions: 1) the diameter of the cluster: any two data points in a cluster have an Euclidean distance, and the maximum of these distances is the diameter of the cluster. 2) Average dissimilarity: the average Euclidean distance.
  • Input A database containing n objects with a termination condition of: the number k of clusters.
  • the Splinter group and the old party are the selected clusters, which are split into two clusters, together with other clusters to form a new cluster set.
  • the historical interaction data of the user is stored on the social platform, and each interaction data is stored as an object on the social platform.
  • the M objects including the first topic are searched from the social platform according to the first topic identifier, where the value of M is a positive integer, and the specific value of M depends on the specific social platform.
  • the user interaction data generated on it is not limited here.
  • the M objects searched from the social platform all include the first topic, that is, the M objects are all about the same topic.
  • FIG. 2 is a schematic diagram of an implementation scenario of searching for M objects from a social platform according to an embodiment of the present invention. Take the object-specific topic article as an example.
  • the first topic is specifically "Luo Yixiao, you give me a stop”.
  • the public account number 1 is Shanghai Beach, in 2016.
  • the topic "Luo Yixiao, you give me a stop” public account 2 for P2P observation, on November 28, 2016 reprinted the topic "Luo Yixiao, you give me a stop”, public account 3 for investment and financial management
  • the public account 4 is posted on Xiaojun, and on November 30, 2016, the topic "Luo Yixiao, you give me a stop”
  • the public account 5 is the NetEase news client.
  • the topic "Luo Yixiao, you give me a stop” is reprinted.
  • the first topic is a topic that needs to identify the original object
  • the social platform is searched by the first topic identifier, and all objects having the first topic may be searched from the database of the social platform to be from the social platform.
  • the number of searched objects is M.
  • the M objects are all included in the first topic.
  • the M objects can be used to count the frequency of occurrence of each application account. For details, see step 103.
  • each of the M objects needs to be analyzed, and the account name of the application account included in each object is analyzed.
  • the object 1 includes: an application account 1, an application account 2, and an application account 3.
  • the object 2 includes: an application account 2 and an application account 3.
  • the object 3 includes: an application account 1 and an application account 3.
  • FIG. 3 is a schematic diagram of an implementation scenario of searching for an application account from an object according to an embodiment of the present invention.
  • the following example illustrates searching for an application account from an object.
  • the object is specifically "Luo Yixiao, you give me a stop”
  • Figure 3 is a schematic diagram of the article content of a topic article, the topic article includes a WeChat public number " ⁇ "," the title of the topic article includes “Luo Yixiao, you give me a stop”, through the topic article search shown in Figure 3, you can count the frequency of the WeChat public number "Rol” appears as 1.
  • the application account that appears on each object is used as a keyword to perform statistics, so that the top N application accounts with the most frequent occurrences can be accurately calculated.
  • the application account is used as a keyword from the M.
  • the N application accounts are counted on the object.
  • the N application accounts are potential application accounts for publishing the original objects. That is, the N application accounts with the most frequently appearing screens may be the original objects of the first topic.
  • the specific value of N depends on the specific content included in the M objects generated on the specific social platform, which is not limited herein.
  • step 103 counts application accounts that appear on each of the M objects, including:
  • the application account with the appearance frequency of the top N is obtained through the foregoing step 103, and the N application accounts can be filtered by the account filtering rule, so that an application can be selected from the N application accounts.
  • the selected application account is defined as the “first application account”, and the object published by the first application account is a native object.
  • the account filtering rule provided by the embodiment of the present invention is a filtering rule for applying account filtering, and the account filtering rule may include filtering an account feature used by an application account, an object feature, and interaction data of the social platform.
  • the account filtering rule provided by the embodiment of the present invention may have at least one of a plurality of filtering rules.
  • the application account whose publishing time is in the first outbreak period or later than the first burst period is filtered out from the application account whose frequency is in the top N, thereby obtaining the first application account.
  • the first burst period is the burst period of the first topic.
  • step 104 identifies the first application account from the top N application accounts with the appearance frequency according to the preset account filtering rule, including:
  • the application account whose forwarding amount is smaller than the forwarding threshold or whose comment quantity is less than the comment threshold is filtered out from the application account whose frequency is in the top N, thereby obtaining the first application account.
  • the account filtering rule is specifically the forwarding amount or the comment volume filtering rule.
  • the application volume in the N application accounts is smaller than the forwarding threshold, or the application account whose comment quantity is less than the comment threshold is filtered out, and only the forwarding amount is greater than or equal to the forwarding.
  • the threshold, or an application account whose comment amount is greater than or equal to the comment threshold may be the first application account.
  • the forwarding threshold and the comment threshold may be specifically determined according to an application scenario, which is not limited herein.
  • step 104 identifies the first application account from the top N application accounts with the appearance frequency according to the preset account filtering rule, including:
  • the application account that does not have the original identifier is filtered out from the application account whose frequency is in the top N, thereby obtaining the first application account.
  • the account filtering rule is specifically the original identification filtering rule.
  • the application account that does not have the original identifier in the N application accounts is filtered out, and only the application account with the original identifier is retained, and the retained application account may be the first application. account number.
  • the original identifier may be carried on the object published by the application account. The specific location of the original identifier on the object may be determined according to the application scenario, which is not limited herein.
  • the embodiment of the present invention introduces a keyword-based native object recognition technology to solve the problem of inaccurate recognition in the prior art.
  • the embodiment of the present invention may search for M objects from the social platform according to the first topic identifier, and the M objects may be used to count the frequency of occurrence of each application account, and perform statistics by using an application account that appears on each object as a keyword. Therefore, the first application account with the highest frequency of occurrence can be accurately calculated, and the first application account is selected by the account filtering rule.
  • the object published by the first application account is the original object identified in the embodiment of the present invention.
  • the N application accounts are counted from the M objects based on the application account as a keyword, and the N application accounts are potential application accounts for publishing the original objects, and then the first application account is selected.
  • An application account, so the first application account screened out is a more accurate account for publishing the original object, thereby improving the recognition accuracy of the native object.
  • the N application accounts selected in the embodiment of the present invention are potential application accounts for publishing the original objects, regardless of whether the occurrence of a certain topic is obtained through the original object itself, and does not affect the pair. Accurate identification of native objects.
  • a device for identifying a native object may include: a topic acquisition module 401, an object search module 402, an account statistics module 403, and an account filtering module 404.
  • the object search module 402 is configured to search, from the social platform, M objects including the first topic according to the first topic identifier, where the M is a positive integer;
  • the account statistics module 403 is configured to collect an application account that appears on each of the M objects, so that an application account with the appearance frequency of the top N is obtained, where the N is a positive integer;
  • the account filtering module 404 is configured to identify the first application account from the top N application accounts according to the preset account filtering rule, and determine the object published by the first application account as a native object.
  • the topic obtaining module 401 is specifically configured to cluster a plurality of topics that are propagated on the social platform, and determine the first topic according to the clustering result.
  • the account statistics module 403 includes:
  • a window search module 4031 configured to sequentially slide a window according to a sliding interval on each of the M objects, and search for a sub-object covered by the window to search for whether the application account is present;
  • the frequency calculation module 4032 is configured to count the frequency of occurrence of the application account on all the sub-objects, wherein one object is divided into a plurality of sub-objects by sliding of the number window.
  • the account filtering module 404 is specifically configured to filter, according to the topic bursting period filtering rule, the publishing time from the first N application accounts whose appearance frequency is in the first burst.
  • the first application account is obtained by the application account of the first eruption period, and the first elapsed time period is an eruption period of the first topic.
  • the account filtering module 404 is specifically configured to filter an application account that does not have an original identifier from the top N application accounts according to the original identifier filtering rule, thereby obtaining an application account.
  • the first application account number is specifically configured to filter an application account that does not have an original identifier from the top N application accounts according to the original identifier filtering rule, thereby obtaining an application account.
  • the first application account number is specifically configured to filter an application account that does not have an original identifier from the top N application accounts according to the original identifier filtering rule, thereby obtaining an application account.
  • the first application account number is specifically configured to filter an application account that does not have an original identifier from the top N application accounts according to the original identifier filtering rule, thereby obtaining an application account.
  • the embodiment of the present invention may search for M objects from the social platform according to the first topic identifier, and the M objects may be used to count the frequency of occurrence of each application account, and perform statistics by using an application account that appears on each object as a keyword. Therefore, the first application account with the highest frequency of occurrence can be accurately calculated, and the first application account is selected by the account filtering rule.
  • the object published by the first application account is the original object identified in the embodiment of the present invention.
  • the N application accounts are counted from the M objects based on the application account as a keyword, and the N application accounts are potential application accounts for publishing the original objects, and then the first application account is selected.
  • FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • the server 1100 may have a large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 1122 (for example, One or more processors and memory 1132, one or more storage media 1130 that store application 1142 or data 1144 (eg, one or one storage device in Shanghai).
  • the memory 1132 and the storage medium 1130 may be short-term storage or persistent storage.
  • the program stored on storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations in the server.
  • central processor 1122 can be configured to communicate with storage medium 1130, executing a series of instruction operations in storage medium 1130 on server 1100.
  • Server 1100 may also include one or more power sources 1126, one or more wired or wireless network interfaces 1150, one or more input and output interfaces 1158, and/or one or more operating systems 1141, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • operating systems 1141 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the step of identifying the native object performed by the server in the above embodiment may be based on the server structure shown in FIG. 5.
  • U disk mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present invention.
  • a computer device may be A personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种原生对象的识别方法和装置,用于提高原生对象的识别准确率。该方法包括:从社交平台上获取到待处理的第一话题,所述第一话题具有第一话题标识(101);根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数(102);统计所述M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,所述N为正整数(103);根据预置的帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象(104)。

Description

一种原生对象的识别方法和装置
本申请要求于2017年5月19日提交中国专利局、申请号为201710358639.0、发明名称为“一种原生对象的识别方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术领域,尤其涉及一种原生对象的识别方法和装置。
背景技术
在当前的话题榜单处理过程中需要识别出原生文章,现有技术中原生文章的识别方法是根据微信朋友圈转发量或分享量最大的文章确定为某个话题的原生文章。
现有技术中提供的原生文章的识别方法需要满足如下假设才能正确识别出原生文章,即按照现有技术需要假设话题是原生文章本身的爆发性转发,进而得到广泛传播的。但在实际问题分析中发现,这种假设往往不一定成立。如果某个话题的出现不是通过原生文章本身,而是通过其他某个公众号,尤其是某个大V公众号转载这篇文章,进而在社交平台(比如微信平台)上得到爆发性传播的话,此时按照现有技术,则会将大V公众号转发的那篇文章误识别为原生文章,从而导致现有技术中原生文章的识别方法存在识别错误的可能性,原生文章的识别准确率较低。
发明内容
本发明实施例提供了一种原生对象的识别方法和装置,用于提高原生对象的识别准确率。
为解决上述技术问题,本发明实施例提供以下技术方案:
第一方面,本发明实施例提供一种原生对象的识别方法,包括:
通过社交平台获取到待处理的第一话题,所述第一话题具有第一话题标识;
根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
统计所述M个对象中每个对象上出现的应用帐号,得到出现频次处于前N个的应用帐号,所述N为正整数;
根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象。
帐号帐号帐号帐号帐号帐号第二方面,本发明实施例还提供一种原生对象的识别装置,包括:
话题获取模块,用于从社交平台上获取到待处理的第一话题,所述第一话题具有第一话题标识;
对象搜索模块,用于根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
帐号统计模块,用于统计所述M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,所述N为正整数;
账户过滤模块,用于根据预置的帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象。
本申请的第三方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请的第四方面,提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请的第五方面,提供一种原生对象的识别装置,可以包括:存储器、收发器、处理器以及总线***;
其中,所述存储器用于存储程序;
所述处理器用于执行所述存储器中的程序,包括如下步骤:
通过社交平台获取到待处理的第一话题,所述第一话题具有第一话题标识;
根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M 个对象,所述M为正整数;
统计所述M个对象中每个对象上出现的应用帐号,得到出现频次处于前N个的应用帐号,所述N为正整数;
根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象;
所述总线***用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
从以上技术方案可以看出,本发明实施例具有以下优点:
在本发明实施例中,首先从社交平台上获取到待处理的第一话题,第一话题具有第一话题标识,其次根据第一话题标识从社交平台上搜索到包含第一话题的M个对象,M为正整数,接下来统计M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,N为正整数,最后根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,将第一应用帐号所发表的对象确定为原生对象。本发明实施例可以根据第一话题标识从社交平台上搜索到M个对象,这M个对象可用于统计各个应用帐号的出现频次,通过对每个对象上出现的应用帐号作为关键字进行统计,从而可以准确计算出出现频次最多的前N个应用帐号,最后再通过帐号过滤规则选择出第一应用帐号,第一应用帐号所发表的对象即为本发明实施例中识别出的原生对象。本发明实施例中基于应用帐号作为关键词从M个对象上统计出的N个应用帐号,这N个应用帐号是潜在的发表原生对象的应用帐号,再从该N个应用帐号中筛选出第一应用帐号,因此所筛选出的第一应用帐号是更准确的发表原生对象的帐号,因此提高了原生对象的识别准确率。相比于现有技术,本发明实施例中筛选出的N个应用帐号是潜在的发表原生对象的应用帐号,无论某个话题的出现是不是通过原生对象本身获得爆发性转发,都不影响对原生对象的准确识别。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本 发明的一些实施例,对于本领域的技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种原生对象的识别方法的流程方框示意图;
图2为本发明实施例提供的从社交平台上搜索M个对象的实现场景示意图;
图3为本发明实施例提供的从对象上搜索应用帐号的实现场景示意图;
图4-a为本发明实施例提供的一种原生对象的识别装置的组成结构示意图;
图4-b为本发明实施例提供的帐号统计模块的组成结构示意图;
图5为本发明实施例提供的原生对象的识别方法应用于服务器的组成结构示意图。
具体实施方式
本发明实施例提供了一种原生对象的识别方法和装置,用于提高原生对象的识别准确率。
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域的技术人员所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、***、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
以下分别进行详细说明。
本发明原生对象的识别方法的一个实施例,具体可以应用于社交平台上的 原生对象的识别中。其中社交平台上的对象具体可以是在社交平台上发表或者转发的文章、图片、视频、图书等电子数据。请参阅图1所示,本发明一个实施例提供的原生对象的识别方法,可以包括如下步骤:
101、从社交平台上获取到待处理的第一话题,第一话题具有第一话题标识。
在本发明实施例中,社交平台可以是用户基于社交应用程序进行交互的平台,例如该社交平台可以是微信平台、QQ平台等即时通讯工具。用户使用社交平台时可以执行浏览朋友圈、发表文章、转发文章、评论文章等交互动作,社交平台中存储有大量用户的历史数据。每个用户在社交平台上注册有一个应用帐号。举例说明,微信公众号是开发者或商家在微信公众平台上申请的应用帐号,该帐号与QQ帐号互通,通过公众号,商家可在微信平台上实现和特定群体的文字、图片、语音、视频的全方位沟通、互动,形成了一种主流的线上线下微信互动方式。
可以理解的是,应用帐号还可以是跨平台帐号,比如A应用帐号(A服务器)登录至B社交平台(B服务器),然后在B社交平台中也可以获取待处理的第一话题。还需要说明的是,本发明实施例中的应用帐号并不限于在平台上注册的帐号,还可以是在不同应用程序或以其他方式获得的帐号。本发明实施例中首先从社交平台上获取到待处理的第一话题,其中,话题是指互联网上某个事件的总称,比如“朝鲜核试验”、“罗一笑,你给我站住”等都是一个具体的话题。社交平台上对每个话题可以分配一个话题标识,话题标识是话题的唯一身份标识,通过该标识可以确定相应的话题。例如为第一话题分配的是第一话题标识。
在本发明的一些实施例中,步骤101从社交平台上获取到待处理的第一话题,包括:
A1、对社交平台上传播的多个话题进行聚类,根据聚类结果确定出第一话题。
其中,第一话题的确定可以通过聚类算法来实现。例如采用文本聚类算法,依据如下的聚类假设:同类的文档相似度较大,而不同类的文档相似度较小,可以对文本信息进行有效地组织、摘要和导航。聚类算法从社交平台上的多个 话题文本进行聚类处理,并对同主题文档进行冗余消除、信息融合、文本生成等处理,从而生成一篇简明扼要的摘要文档。
举例说明如下,通过统计在朋友圈的转发量,以及话题的文本聚类算法,可以识别出在16年12月初有一篇《罗一笑,你给我站住》是一个需要识别原生文章的话题。本发明实施例提供的文本聚类算法可以分为划分式聚类算法(例如kmeans)和层次型聚类算法。以层次型聚类算法为例,主要分为自底而上的凝聚式和自顶而下的***式。本发明实施例中以采用***式聚类算法(即DIANA算法)为例,首先将所有的对象初始化到一个簇中,然后根据一些原则(比如最邻近的最大欧式距离),将该簇分类。直到到达用户指定的簇数目或者两个簇之间的距离超过了某个阈值。
其中,DIANA算法中用到如下两个定义:1)簇的直径:在一个簇中的任意两个数据点都有一个欧氏距离,这些距离中的最大值是簇的直径。2)平均相异度:即平均欧式距离。
DIANA算法的描述如下:
输入:包含n个对象的数据库,终止条件为:簇的数目k。
输出:k个簇,达到终止条件规定簇数目。
1)将所有对象整个当成一个初始簇。
2)执行语句:For(i=1;i!=k;i++)Do Begin。
3)在所有簇中挑选出具有最大直径的簇。
4)找出所挑出簇里与其他点平均相异度最大的一个点放入***组(splinter group),剩余的放入原有部分(old party)中。
5)重新执行前述的步骤1)至步骤4)。
6)在old party里找出到splinter group中点的最近距离不大于old party中点的最近距离的点,并将该点加入splinter group。
7)直到(Until)没有新的old party的点被分配给splinter group。
8)Splinter group和old party为被选中的簇,从而***成两个簇,与其他簇一起组成新的簇集合。
9)结束。
通过前述的DIANA算法,可以将社交平台上传播的多个话题进行聚类, 根据聚类结果确定出第一话题。
102、根据第一话题标识从社交平台上搜索到包含第一话题的M个对象,M为正整数。
在本发明实施例中,社交平台上存储有用户的历史交互数据,每个交互数据都作为对象存储在社交平台上。确定出待处理的第一话题之后,根据第一话题标识从社交平台上搜索到包含第一话题的M个对象,其中M的取值为正整数,M的具体取值取决于具体的社交平台上产生的用户交互数据,此处不做限定。从社交平台上搜索出的M个对象都是包括第一话题的,即M个对象都是关于同一个话题。举例说明,请参阅图2所示,为本发明实施例提供的从社交平台上搜索M个对象的实现场景示意图。以对象具体为话题文章为例,以第一话题具体为“罗一笑,你给我站住”,搜索社交平台的数据库之后可以得到多个话题文章,例如公众帐号1为上海滩网,于2016年12月24日转载了话题“罗一笑,你给我站住”,公众帐号2为P2P观察,于2016年11月28日转载了话题“罗一笑,你给我站住”,公众帐号3为投资理财频道,于2016年12月3日转载了话题“罗一笑,你给我站住”,公众帐号4为贴吧小君,于2016年11月30日转载了话题“罗一笑,你给我站住”,公众帐号5为网易新闻客户端,于2016年11月30日转载了话题“罗一笑,你给我站住”。
本发明实施例中,第一话题是需要识别原始对象的话题,以该第一话题标识来搜索社交平台,可以从社交平台的数据库中搜索有具有第一话题的所有对象,以从社交平台上搜索到的对象个数为M个为例进行说明,这些M个对象都是包含第一话题的,这M个对象可用于统计各个应用帐号的出现频次,详见后续步骤103。通过对社交平台按照第一话题进行搜索,可以找到社交平台上用于识别原生对象的M个对象,从而便于后续步骤中能够准确查找到原生对象。
103、统计M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,N为正整数。
在本发明实施例中,从社交平台上搜索到包含第一话题的M个对象之后,对于这些M个对象中的每个对象都需要分析,分析每个对象中包含的应用帐号的帐号名称以及各个应用帐号出现的频次。举例说明如下,共有3个对象需 要分析,分别为:对象1、对象2和对象3。其中,对象1中包括:应用帐号1、应用帐号2和应用帐号3,对象2中包括:应用帐号2和应用帐号3,对象3中包括:应用帐号1和应用帐号3。则通过对上述3个对象的分析,统计出在上述3个对象中共出现有3个应用帐号,其中,应用帐号1出现2次,应用帐号2出现1次,应用帐号3出现3次。
如图3所示,为本发明实施例提供的从对象上搜索应用帐号的实现场景示意图。接下来举例说明从对象上搜索应用帐号。以对象具体为话题文章为例,以第一话题具体为“罗一笑,你给我站住”,在图3中为一个话题文章的文章内容示意图,该话题文章中包括了一个微信公号“罗尔”,该话题文章的标题包括“罗一笑,你给我站住”,通过图3所示的话题文章搜索,可以统计出微信公号“罗尔”出现的频次为1。
本发明实施例中,通过对每个对象上出现的应用帐号作为关键字进行统计,从而可以准确计算出出现频次最多的前N个应用帐号,本发明实施例中基于应用帐号作为关键词从M个对象上统计出的N个应用帐号,这N个应用帐号是潜在的发表原生对象的应用帐号,即这些筛选出的出现频次最多的N个应用帐号可能是第一话题的原生对象。N的具体取值取决于具体的社交平台上产生的M个对象中包括的具体内容,此处不做限定。
在本发明的一些实施例中,步骤103统计M个对象中每个对象上出现的应用帐号,包括:
B1、在M个对象中的每个对象上按照滑动间隔依次滑动窗口,从窗口所覆盖到的子对象上搜索是否出现有应用帐号;
B2、统计应用帐号在所有子对象上出现的频次,其中,一个对象通过数窗口的滑动分割为多个子对象。
其中,在每个对象上设置可滑动的窗口,首先定义窗口大小,例如窗口大小为100个字,该窗口在对象上按照滑动间隔依次滑动,该滑动间隔设置为100个字,则窗口每次在对象上滑动100个字,对于每次窗口在对象上停留,该窗口都可以从对象上分割出一个子对象,该子对象的所有内容处于该窗口内,从窗口所覆盖到的子对象上搜索是否出现有应用帐号,接下来统计应用帐号在所有子对象上出现的频次。
举例说明如下,以对象具体为话题文章为例,应用帐号为微信公众号。本发明实施例通过窗口共现识别潜在发表原生文章的N个公众号。如图3所示,将包含《罗一笑,你给我站住》这个话题的所有话题文章进行分析,统计正文其中某个窗口内该文章topic_title(即罗一笑,你给我站住)周边出现的公众号名集合,这里窗口是指一个大小为K的滑动窗口,比如K取值为100个字,就可以得到大小为100个字的窗口。在窗口内前后相距100个字以内可以统计微信公众号的名称以及出现频次,通过窗口的不断移动,可以将一个对象上出现的所有公众号都统计出来,例如统计得到如下信息:《topic_title,公众号名1》,《topic_title,公众号名2》,《topic_title,公众号名3》。通过上述公众号名频次统计并排序,即可以得到该话题文章潜在发表的最多N个公众号(即topN),频次是某个公众号名在窗口内出现的次数。N的取值是窗口内出现的公众号的个数,这N个公众号构成潜在发表原生文章的公众号名集合。
104、根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,将第一应用帐号所发表的对象确定为原生对象。
在本发明实施例中,通过前述步骤103获取到了出现频次处于前N个的应用帐号,这些N个应用帐号可以再通过帐号过滤规则进行过滤,从而可以从N个应用帐号中筛选出某一个应用帐号,为便于描述,将筛选出的这个应用帐号定义为“第一应用帐号”,则该第一应用帐号所发表的对象就是原生对象。其中,本发明实施例提供的帐号过滤规则是用于应用帐号过滤的筛选规则,该帐号过滤规则可包括过滤应用帐号所使用的帐号特征、对象特征以及社交平台的交互数据等。本发明实施例提供的帐号过滤规则可以有多种过滤规则中的至少一种,例如帐号过滤规则可以是单一的过滤规则,也可以多种过滤规则的组合,此处不做限定。本发明实施例中的原生对象也可以称为话题的原生文章,原生对象指的是某个话题的最早出现的对象。
举例说明如下,以对象具体为话题文章为例,应用帐号为微信公众号。在上述topN个潜在公众号下检索是否存在对应原生文章的标题(title),例如搜索话题名:“罗一笑,你给我站住”,如《罗一笑,你给我站住》相关文章及其发表时间,判断其发表时间段是否在该话题爆发的早期时间段,这个时间段是话题文本聚类中聚合出该新话题的时间点,也即搜索该title所得文章发表时间 最早的时间段。进而再结合其发表时的朋友圈转发量及是否原创等特征进行规则过滤,最终识别出唯一的原生文章的统一资源定位符(Uniform Resoure Locator,URL)地址。
在本发明的一些实施例中,步骤104根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,包括:
C1、根据话题爆发时间段过滤规则,从出现频次处于前N个的应用帐号中过滤掉发表时间处于第一爆发时间段或晚于第一爆发时间段的应用帐号,从而得到第一应用帐号,第一爆发时间段为第一话题的爆发时间段。
其中,以帐号过滤规则具体为话题爆发时间段过滤规则为例,N个应用帐号中发表时间处于第一爆发时间段或晚于第一爆发时间段的应用帐号被过滤掉,只保留发表时间处于第一爆发时间段之前的应用帐号,保留下来的应用帐号可以为第一应用帐号。
在本发明的一些实施例中,步骤104根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,包括:
C2、根据转发量或评论量过滤规则,从出现频次处于前N个的应用帐号中过滤掉转发量小于转发阈值,或评论量小于评论阈值的应用帐号,从而得到第一应用帐号。
其中,以帐号过滤规则具体为转发量或评论量过滤规则为例,N个应用帐号中转发量小于转发阈值,或评论量小于评论阈值的应用帐号被过滤掉,只保留转发量大于或等于转发阈值,或评论量大于或等于评论阈值的应用帐号,保留下来的应用帐号可以为第一应用帐号。其中,转发阈值和评论阈值具体可以根据应用场景来确定,此处不做限定。
在本发明的一些实施例中,步骤104根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,包括:
C3、根据原创标识过滤规则,从出现频次处于前N个的应用帐号中过滤掉不具有原创标识的应用帐号,从而得到第一应用帐号。
其中,以帐号过滤规则具体为原创标识过滤规则为例,N个应用帐号中不具有原创标识的应用帐号被过滤掉,只保留具有原创标识的应用帐号,保留下来的应用帐号可以为第一应用帐号。其中,原创标识具体可以携带在应用帐号 所发表的对象上,原创标识在对象上的具***置可以根据应用场景来确定,此处不做限定。
需要说明的是,在本发明的前述实施例中,步骤C1、步骤C2、步骤C3分别从不同的帐号过滤规则的角度描述了第一应用帐号的确定过程,本发明实施例提供的帐号过滤规则可以有多种过滤规则中的至少一种,例如帐号过滤规则可以是单一的过滤规则,也可以多种过滤规则的组合,例如前述步骤C1、步骤C2、步骤C3可以组合起来共同用于第一应用帐号的确定。接下来进行举例说明,以对象具体为话题文章为例,应用帐号为微信公众号。经过检索得到的候选原生文章集合为{O},共计759篇,需要经过以下帐号过滤规则才可以最终之别出唯一的原生文章。
规则一:根据话题爆发时间段过滤,通过检索及话题聚类可知《罗一笑,你给我站住》这个话题的文章是在2016年11月25日~2016年11月28日在朋友圈等社交媒体广泛传播的,通过比较{O}中的文章发表时间可以将该原生文章候选集合从759篇缩减为180篇。
规则二:根据朋友圈转发量过滤,将前述180篇文章按朋友圈转发量排序,取最高的前2篇作为新的进一步缩减的原生文章候选集合,分别为:
“P2P观察”公众号发表的《罗一笑,你给我站住》,转发量为100万。
“罗尔”公众号发表的《罗一笑,你给我站住》,转发量为10万多。
规则三:进一步根据文章是否为原创来过滤:
“P2P观察”公众号发表的《罗一笑,你给我站住》,无原创标识。
“罗尔”公众号发表的《罗一笑,你给我站住》,有原创标识。
通过以上三个帐号过滤规则对N个微信公众号的过滤,可以识别出该话题对应的原生文章为“罗尔”发表的该篇文章:http://mp.weixin.qq.com/s/o4Zhq8jqxqFVcFIwf4g5rw。
通过以上实施例对本发明实施例的描述可知,本发明实施例引入一种基于关键词的原生对象识别技术,来解决现有技术中识别不准的问题。本发明实施例可以根据第一话题标识从社交平台上搜索到M个对象,这M个对象可用于统计各个应用帐号的出现频次,通过对每个对象上出现的应用帐号作为关键字进行统计,从而可以准确计算出出现频次最多的前N个应用帐号,最后再通 过帐号过滤规则选择出第一应用帐号,第一应用帐号所发表的对象即为本发明实施例中识别出的原生对象。本发明实施例中基于应用帐号作为关键词从M个对象上统计出的N个应用帐号,这N个应用帐号是潜在的发表原生对象的应用帐号,再从该N个应用帐号中筛选出第一应用帐号,因此所筛选出的第一应用帐号是更准确的发表原生对象的帐号,因此提高了原生对象的识别准确率。相比于现有技术,本发明实施例中筛选出的N个应用帐号是潜在的发表原生对象的应用帐号,无论某个话题的出现是不是通过原生对象本身获得爆发性转发,都不影响对原生对象的准确识别。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
为便于更好的实施本发明实施例的上述方案,下面还提供用于实施上述方案的相关装置。
请参阅图4-a所示,本发明实施例提供的一种原生对象的识别装置400,可以包括:话题获取模块401、对象搜索模块402、帐号统计模块403、账户过滤模块404,其中,
话题获取模块401,用于从社交平台上获取到待处理的第一话题,所述第一话题具有第一话题标识;
对象搜索模块402,用于根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
帐号统计模块403,用于统计所述M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,所述N为正整数;
账户过滤模块404,用于根据预置的帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象。
在本发明的一些实施例中,所述话题获取模块401,具体用于对所述社交平台上传播的多个话题进行聚类,根据聚类结果确定出所述第一话题。
在本发明的一些实施例中,请参阅图4-b所示,所述帐号统计模块403,包括:
窗口搜索模块4031,用于在所述M个对象中的每个对象上按照滑动间隔依次滑动窗口,从所述窗口所覆盖到的子对象上搜索是否出现有所述应用帐号;
频次计算模块4032,用于统计所述应用帐号在所有子对象上出现的频次,其中,一个对象通过数窗口的滑动分割为多个子对象。
在本发明的一些实施例中,所述账户过滤模块404,具体用于根据话题爆发时间段过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉发表时间处于所述第一爆发时间段或晚于所述第一爆发时间段的应用帐号,从而得到所述第一应用帐号,所述第一爆发时间段为所述第一话题的爆发时间段。
在本发明的一些实施例中,所述账户过滤模块404,具体用于根据转发量或评论量过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉转发量小于转发阈值,或评论量小于评论阈值的应用帐号,从而得到所述第一应用帐号。
在本发明的一些实施例中,所述账户过滤模块404,具体用于根据原创标识过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉不具有原创标识的应用帐号,从而得到所述第一应用帐号。
通过以上对本发明实施例的描述可知,通过以上实施例对本发明实施例的描述可知,在本发明实施例中,首先从社交平台上获取到待处理的第一话题,第一话题具有第一话题标识,其次根据第一话题标识从社交平台上搜索到包含第一话题的M个对象,M为正整数,接下来统计M个对象中每个对象上出现的应用帐号,从而得到出现频次处于前N个的应用帐号,N为正整数,最后根据预置的帐号过滤规则从出现频次处于前N个的应用帐号中识别出第一应用帐号,将第一应用帐号所发表的对象确定为原生对象。本发明实施例可以根据第一话题标识从社交平台上搜索到M个对象,这M个对象可用于统计各个应用帐号的出现频次,通过对每个对象上出现的应用帐号作为关键字进行统计,从而可以准确计算出出现频次最多的前N个应用帐号,最后再通过帐号过滤规则选择出第一应用帐号,第一应用帐号所发表的对象即为本发明实施例 中识别出的原生对象。本发明实施例中基于应用帐号作为关键词从M个对象上统计出的N个应用帐号,这N个应用帐号是潜在的发表原生对象的应用帐号,再从该N个应用帐号中筛选出第一应用帐号,因此所筛选出的第一应用帐号是更准确的发表原生对象的帐号,因此提高了原生对象的识别准确率。相比于现有技术,本发明实施例中筛选出的N个应用帐号是潜在的发表原生对象的应用帐号,无论某个话题的出现是不是通过原生对象本身获得爆发性转发,都不影响对原生对象的准确识别。
图5是本发明实施例提供的一种服务器结构示意图,该服务器1100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以***处理器(central processing units,CPU)1122(例如,一个或一个以上处理器)和存储器1132,一个或一个以上存储应用程序1142或数据1144的存储介质1130(例如一个或一个以上海量存储设备)。其中,存储器1132和存储介质1130可以是短暂存储或持久存储。存储在存储介质1130的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1122可以设置为与存储介质1130通信,在服务器1100上执行存储介质1130中的一系列指令操作。
服务器1100还可以包括一个或一个以上电源1126,一个或一个以上有线或无线网络接口1150,一个或一个以上输入输出接口1158,和/或,一个或一个以上操作***1141,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的原生对象的识别方法步骤可以基于该图5所示的服务器结构。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本发明而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
综上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照上述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对上述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (15)

  1. 一种原生对象的识别方法,其特征在于,包括:
    通过社交平台获取到待处理的第一话题,所述第一话题具有第一话题标识;
    根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
    统计所述M个对象中每个对象上出现的应用帐号的出现频次,得到出现频次处于前N个的应用帐号,所述N为正整数;
    根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象。
  2. 根据权利要求1所述的方法,其特征在于,所述通过社交平台获取到待处理的第一话题,包括:
    对所述社交平台上传播的多个话题进行聚类处理,得到聚类结果;
    根据所述聚类结果确定所述第一话题。
  3. 根据权利要求1所述的方法,其特征在于,所述统计所述M个对象中每个对象上出现的应用帐号,包括:
    在所述M个对象中的每个对象上按照滑动间隔依次滑动窗口,从所述窗口所覆盖到的子对象上搜索是否出现有所述应用帐号;
    统计所述应用帐号在所有子对象上出现的频次,其中,一个对象通过数窗口的滑动分割为多个子对象。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,包括:
    根据话题爆发时间段过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉发表时间处于所述第一爆发时间段或晚于所述第一爆发时间段的应用帐号,得到所述第一应用帐号,所述第一爆发时间段为所述第一话题的爆发时间段。
  5. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号, 包括:
    根据转发量或评论量过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉转发量小于转发阈值,或评论量小于评论阈值的应用帐号,得到所述第一应用帐号。
  6. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,包括:
    根据原创标识过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉不具有原创标识的应用帐号,得到所述第一应用帐号。
  7. 一种原生对象的识别装置,其特征在于,包括:
    话题获取模块,用于通过社交平台获取到待处理的第一话题,所述第一话题具有第一话题标识;
    对象搜索模块,用于根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
    帐号统计模块,用于统计所述M个对象中每个对象上出现的应用帐号的出现频次,得到出现频次处于前N个的应用帐号,所述N为正整数;
    账户过滤模块,用于根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象。
  8. 根据权利要求7所述的装置,其特征在于,所述话题获取模块,具体用于对所述社交平台上传播的多个话题进行聚类处理,得到聚类结果;
    根据所述聚类结果确定出所述第一话题。
  9. 根据权利要求7所述的装置,其特征在于,所述帐号统计模块,包括:
    窗口搜索模块,用于在所述M个对象中的每个对象上按照滑动间隔依次滑动窗口,从所述窗口所覆盖到的子对象上搜索是否出现有所述应用帐号;
    频次计算模块,用于统计所述应用帐号在所有子对象上出现的频次,其中,一个对象通过数窗口的滑动分割为多个子对象。
  10. 根据权利要求7至9中任一项所述的装置,其特征在于,所述账户过滤模块,具体用于根据话题爆发时间段过滤规则,从所述出现频次处于前N 个的应用帐号中过滤掉发表时间处于所述第一爆发时间段或晚于所述第一爆发时间段的应用帐号,得到所述第一应用帐号,所述第一爆发时间段为所述第一话题的爆发时间段。
  11. 根据权利要求7至9中任一项所述的装置,其特征在于,所述账户过滤模块,具体用于根据转发量或评论量过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉转发量小于转发阈值,或评论量小于评论阈值的应用帐号,得到所述第一应用帐号。
  12. 根据权利要求7至9中任一项所述的装置,其特征在于,所述账户过滤模块,具体用于根据原创标识过滤规则,从所述出现频次处于前N个的应用帐号中过滤掉不具有原创标识的应用帐号,得到所述第一应用帐号。
  13. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-6中任意一项所述的方法。
  14. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-6中任一项所述的方法。
  15. 一种计算机设备,其特征在于,包括:存储器、收发器、处理器以及总线***;
    其中,所述存储器用于存储程序;
    所述处理器用于执行所述存储器中的程序,包括如下步骤:
    通过社交平台获取到待处理的第一话题,所述第一话题具有第一话题标识;
    根据所述第一话题标识从所述社交平台上搜索到包含所述第一话题的M个对象,所述M为正整数;
    统计所述M个对象中每个对象上出现的应用帐号的出现频次,得到出现频次处于前N个的应用帐号,所述N为正整数;
    根据帐号过滤规则从所述出现频次处于前N个的应用帐号中识别出第一应用帐号,将所述第一应用帐号所发表的对象确定为原生对象;
    所述总线***用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
PCT/CN2018/079243 2017-05-19 2018-03-16 一种原生对象的识别方法和装置 WO2018210045A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/388,083 US11250077B2 (en) 2017-05-19 2019-04-18 Native object identification method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710358639.0A CN108959295B (zh) 2017-05-19 2017-05-19 一种原生对象的识别方法和装置
CN201710358639.0 2017-05-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/388,083 Continuation US11250077B2 (en) 2017-05-19 2019-04-18 Native object identification method and apparatus

Publications (1)

Publication Number Publication Date
WO2018210045A1 true WO2018210045A1 (zh) 2018-11-22

Family

ID=64273275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079243 WO2018210045A1 (zh) 2017-05-19 2018-03-16 一种原生对象的识别方法和装置

Country Status (3)

Country Link
US (1) US11250077B2 (zh)
CN (1) CN108959295B (zh)
WO (1) WO2018210045A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021073271A1 (zh) * 2019-10-17 2021-04-22 平安科技(深圳)有限公司 舆情分析方法、装置、计算机装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233575B1 (en) * 1997-06-24 2001-05-15 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
CN103235821A (zh) * 2013-04-27 2013-08-07 百度在线网络技术(北京)有限公司 原创内容的搜索方法和搜索服务器
CN103810167A (zh) * 2012-11-06 2014-05-21 腾讯科技(深圳)有限公司 获取信息的方法和装置
CN103914491A (zh) * 2013-01-09 2014-07-09 腾讯科技(北京)有限公司 对优质用户生成内容的数据挖掘方法和***

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100478924B1 (ko) * 2004-06-01 2005-03-29 엔에이치엔(주) 다수의 검색 기준을 이용한 커뮤니티 검색 서비스 시스템 및 그 검색 방법
US9143572B2 (en) * 2004-09-17 2015-09-22 About, Inc. Method and system for providing content to users based on frequency of interaction
US20080235189A1 (en) * 2007-03-23 2008-09-25 Drew Rayman System for searching for information based on personal interactions and presences and methods thereof
US8762875B2 (en) * 2011-12-23 2014-06-24 Blackberry Limited Posting activity visualization
US9152709B2 (en) * 2013-02-25 2015-10-06 Microsoft Technology Licensing, Llc Cross-domain topic space
US20140324719A1 (en) * 2013-03-15 2014-10-30 Bruce A. Canal Social media screening and alert system
US10430894B2 (en) * 2013-03-21 2019-10-01 Khoros, Llc Gamification for online social communities
US20150026192A1 (en) * 2013-04-19 2015-01-22 salesforce.com,inc. Systems and methods for topic filter recommendation for online social environments
US20150142782A1 (en) * 2013-11-15 2015-05-21 Trendalytics, Inc. Method for associating metadata with images
CN103778245B (zh) * 2014-02-13 2017-04-05 北京奇艺世纪科技有限公司 一种识别用户评论的方法及装置
CN105227425B (zh) * 2014-05-26 2019-11-15 腾讯科技(北京)有限公司 聚合消息的方法、设备和网络社交***
US20170061469A1 (en) * 2015-08-28 2017-03-02 Sprinklr, Inc. Dynamic campaign analytics via hashtag detection
US10382577B2 (en) * 2015-01-30 2019-08-13 Microsoft Technology Licensing, Llc Trending topics on a social network based on member profiles
US10997257B2 (en) * 2015-02-06 2021-05-04 Facebook, Inc. Aggregating news events on online social networks
US20170277691A1 (en) * 2016-03-22 2017-09-28 Facebook, Inc. Quantifying Social Influence
US9799082B1 (en) * 2016-04-25 2017-10-24 Post Social, Inc. System and method for conversation discovery
CN106649509B (zh) * 2016-10-12 2020-04-07 腾讯科技(北京)有限公司 用户特征提取方法及装置
US10095671B2 (en) * 2016-10-28 2018-10-09 Microsoft Technology Licensing, Llc Browser plug-in with content blocking and feedback capability
US10650009B2 (en) * 2016-11-22 2020-05-12 Facebook, Inc. Generating news headlines on online social networks
US10535081B2 (en) * 2016-12-20 2020-01-14 Facebook, Inc. Optimizing audience engagement with digital content shared on a social networking system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6233575B1 (en) * 1997-06-24 2001-05-15 International Business Machines Corporation Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
CN103810167A (zh) * 2012-11-06 2014-05-21 腾讯科技(深圳)有限公司 获取信息的方法和装置
CN103914491A (zh) * 2013-01-09 2014-07-09 腾讯科技(北京)有限公司 对优质用户生成内容的数据挖掘方法和***
CN103235821A (zh) * 2013-04-27 2013-08-07 百度在线网络技术(北京)有限公司 原创内容的搜索方法和搜索服务器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021073271A1 (zh) * 2019-10-17 2021-04-22 平安科技(深圳)有限公司 舆情分析方法、装置、计算机装置及存储介质

Also Published As

Publication number Publication date
US11250077B2 (en) 2022-02-15
CN108959295A (zh) 2018-12-07
CN108959295B (zh) 2021-04-16
US20190272297A1 (en) 2019-09-05

Similar Documents

Publication Publication Date Title
US11003726B2 (en) Method, apparatus, and system for recommending real-time information
TWI653542B (zh) 一種基於網路媒體資料流程發現並跟蹤熱點話題的方法、系統和裝置
CN104573054B (zh) 一种信息推送方法和设备
AU2017355420B2 (en) Systems and methods for event detection and clustering
Guzman et al. On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model
US9818080B2 (en) Categorizing a use scenario of a product
Shi et al. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news
WO2020160186A1 (en) Real-time event detection on social data streams
CN103745000A (zh) 一种中文微博客的热点话题检测方法
US10346496B2 (en) Information category obtaining method and apparatus
EP3420473A1 (en) Expert detection in social networks
Alami et al. Cybercrime profiling: Text mining techniques to detect and predict criminal activities in microblog posts
Sapul et al. Trending topic discovery of Twitter Tweets using clustering and topic modeling algorithms
Kunneman et al. Event detection in Twitter: A machine-learning approach based on term pivoting
Daouadi et al. Organization vs. Individual: Twitter User Classification.
Zarrad et al. The evaluation of the public opinion-a case study: Mers-cov infection virus in ksa
CN107330076B (zh) 一种网络舆情信息展示***及方法
CN110750707A (zh) 关键词推荐方法、装置和电子设备
Kavitha et al. Discovering public opinions by performing sentimental analysis on real time Twitter data
CN113806483A (zh) 数据处理方法、装置、电子设备及计算机程序产品
Angaramo et al. Online clustering and classification for real-time event detection in Twitter.
WO2018210045A1 (zh) 一种原生对象的识别方法和装置
CN104077320B (zh) 一种用于生成待发布信息的方法和装置
CN112148841A (zh) 一种对象分类以及分类模型构建方法和装置
Yu et al. Hot topic analysis and content mining in social media

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18802635

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18802635

Country of ref document: EP

Kind code of ref document: A1