CN110990571A - Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment - Google Patents

Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment Download PDF

Info

Publication number
CN110990571A
CN110990571A CN201911218626.9A CN201911218626A CN110990571A CN 110990571 A CN110990571 A CN 110990571A CN 201911218626 A CN201911218626 A CN 201911218626A CN 110990571 A CN110990571 A CN 110990571A
Authority
CN
China
Prior art keywords
word
posts
type
discussion
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911218626.9A
Other languages
Chinese (zh)
Other versions
CN110990571B (en
Inventor
程雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Second Hand Artificial Intelligence Technology Co ltd
Original Assignee
Admaster Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Admaster Technology Beijing Co ltd filed Critical Admaster Technology Beijing Co ltd
Priority to CN201911218626.9A priority Critical patent/CN110990571B/en
Publication of CN110990571A publication Critical patent/CN110990571A/en
Application granted granted Critical
Publication of CN110990571B publication Critical patent/CN110990571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method and a device for obtaining discussion occupation ratio, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring the contents of a plurality of posts; respectively segmenting the content of the posts by a segmentation model, and matching each word obtained after segmentation with a preset classification word packet to obtain the type of each word, wherein the classification word packet defines the types of different words; calculating the discussion occupation ratio corresponding to each type of words according to the occurrence condition of each type of words in the posts, wherein the discussion occupation ratio is used for representing the ratio of the discussion degree of the target post corresponding to each type to the discussion degree of the posts. The embodiment can realize rapid acquisition of posts and rapid analysis of post contents, and can simply and rapidly obtain discussion ratios of different topic contents in a plurality of posts.

Description

Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment
Technical Field
The application relates to the technical field of data analysis, in particular to a discussion proportion obtaining method, a discussion proportion obtaining device, a storage medium and electronic equipment.
Background
Currently, in the prior art, a Uniform Resource Locator (URL) of a third-party platform (such as a wechat or a microblog) is analyzed to obtain data of a post, and then the data of the post is generated into an Excel table, and the content, forwarding number, comment number, praise number, and the like of the post corresponding to each URL are displayed in the Excel table. However, if the discussion heat of different contents in the posts needs to be analyzed, the Excel table needs to be manually uploaded to a word segmentation model manually, the word frequency of each word is counted, and then the analysis is performed, so that the steps are complicated and the statistics is inconvenient.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device, a storage medium and electronic equipment for obtaining discussion ratios, which can be used for analyzing the discussion ratios of different types of contents in a plurality of posts, and the whole analysis process is efficient and convenient.
In a first aspect, an embodiment of the present application provides a method for obtaining a discussion ratio, including: acquiring the contents of a plurality of posts; respectively segmenting the content of the posts by a segmentation model, and matching each word obtained after segmentation with a preset classification word packet to obtain the type of each word, wherein the classification word packet defines the types of different words; calculating the discussion occupation ratio corresponding to each type of words according to the occurrence condition of each type of words in the posts, wherein the discussion occupation ratio is used for representing the ratio of the discussion degree of the target post corresponding to each type to the discussion degree of the posts.
In the scheme, by using technologies such as word segmentation models and word packet matching, the words can be automatically segmented, the types can be matched, and statistical analysis can be performed only by acquiring the contents of the posts to be analyzed, so that the specific proportion of discussing different types of contents in the posts can be quickly obtained, the whole analysis process is efficient and convenient, and a large amount of manual intervention is not needed.
Optionally, the discussion degree includes a browsing number of posts, and the calculating a discussion percentage corresponding to each type of word according to an occurrence of each type of word in the posts includes: determining posts appeared by each type of words from the posts, and obtaining target posts corresponding to each type of words; and calculating the ratio of the browsing number of the target posts to the total browsing number of the posts to obtain the discussion percentage corresponding to each type of words.
In the scheme, the discussion degree can be represented by the browsing number of the posts, the discussion duty ratio is calculated according to the browsing frequency of the posts in which each type of word appears, and in the calculation process, each word only needs to pay attention to how many posts it appears, and the frequency of the word appearing in each post is not counted, so that the calculation amount can be greatly reduced.
Optionally, after matching each word obtained after the word segmentation with a preset classification word package to obtain a type of each word, the method further includes: displaying a word segmentation list on a page, wherein the word segmentation list comprises words obtained by segmenting the content of each post and the type corresponding to each word; and acquiring new words added by the user based on the word segmentation list, executing a word segmentation process on the contents of the posts through a word segmentation model according to the new words, and updating the words obtained after word segmentation.
In the above scheme, when some words which are not in the actual semantics exist in the post, such as network terms, and the like, the words may not be segmented by the segmentation model, if some information related to the word needs to be counted, a word supplement method may be adopted, the user inputs the word which needs to be segmented by himself, and after the word input by the user is obtained, the segmentation model performs segmentation again according to the word input by the user.
Optionally, after the word segmentation list is displayed on the page, the method further includes: and acquiring the type set by the user for the target word based on the word segmentation list, and updating the type of the target word to the set type.
In the above scheme, the user may manually classify the words not correctly classified or the words not matched to a specific type in the classified word package, which are shown in the word segmentation list. And after the type set by the user is obtained, updating the word segmentation list, wherein in the updated word segmentation list, the type of the target word is updated to the type set by the user.
Optionally, the obtaining the content of the plurality of posts includes: and acquiring a plurality of Uniform Resource Locators (URLs), and analyzing each URL to acquire the content of the posts. In the scheme, the content of the posts can be obtained in a URL (uniform resource locator) analysis mode, so that the method is more efficient and quicker.
In a second aspect, an embodiment of the present application provides a device for obtaining a discussion ratio, including: the acquisition module is used for acquiring the contents of a plurality of posts; the word segmentation module is used for segmenting the content of the posts through a word segmentation model respectively, and matching each word obtained after word segmentation with a preset classification word packet to obtain the type of each word, wherein the classification word packet defines the types of different words; the analysis module is used for calculating the discussion duty ratio corresponding to each type of word according to the appearance of each type of word in the posts, and the discussion duty ratio is used for expressing the ratio of the discussion degree of the target post corresponding to each type to the discussion degree of the posts.
Optionally, the discussion degree includes a browsing number of the posts, and the analysis module is specifically configured to: determining posts appeared by each type of words from the posts, and obtaining target posts corresponding to each type of words; and calculating the ratio of the browsing number of the target posts to the total browsing number of the posts to obtain the discussion percentage corresponding to each type of words.
Optionally, the device further includes a display module and an update module, where the display module is configured to display a word segmentation list on a page, where the word segmentation list includes words obtained by segmenting content of each post and a type corresponding to each word; the updating module is used for acquiring new words added by the user based on the word segmentation list, executing a word segmentation process on the contents of the posts through a word segmentation model according to the new words through the word segmentation module, and updating words obtained after word segmentation.
Optionally, the updating module is further configured to acquire a type set by the user for the target word based on the word segmentation list, and update the type of the target word to the set type.
Optionally, the obtaining module is specifically configured to: and acquiring a plurality of Uniform Resource Locators (URLs), and analyzing each URL to acquire the content of the posts.
In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method according to the first aspect is performed.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a discussion proportion obtaining method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an obtaining apparatus for discussing the proportion provided in the embodiment of the present application;
fig. 3 is a schematic view of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The embodiment of the application provides a discussion ratio obtaining method, which can realize rapid obtaining of posts and rapid analysis of post contents, and can simply and rapidly obtain discussion ratios of different topic contents in a plurality of posts. Fig. 1 shows a flowchart of a method for obtaining a percentage in the present embodiment, which includes the following steps:
step 110: the content of the plurality of posts is obtained.
The posts are posts needing to be analyzed, the posts comprise posts of a third-party platform, and the third-party platform comprises WeChat, microblog, forum and the like. In one embodiment, the text content of the posts given by the user and the discussion degree of each post may be directly obtained, and the text content of the posts may be provided in the form of a document (such as an Excel form document or a Word text document), or a plurality of pictures containing the content of the posts may be directly obtained, and the content of the posts may be obtained in a manner of converting the pictures into texts, wherein the discussion degree of the posts includes, but is not limited to, the browsing number, forwarding number, comment number, collection number, and the like of the posts. In another embodiment, multiple URLs corresponding to multiple posts given by a user may also be obtained, and after each URL is parsed, the content of the multiple posts is obtained, where each URL corresponds to one post. In particular, multiple URLs may be stored in an Excel table, with a link to a URL stored in a cell in the table. After filling in a plurality of URLs to be analyzed in an Excel form, a user uploads the URLs to be analyzed, so that the method automatically acquires each URL from the Excel form and analyzes a post corresponding to each URL, thereby acquiring the content of a plurality of posts, wherein the number of the posts is the same as that of the URLs. Optionally, after the analysis, the browsing number, the forwarding number, the comment number, the collection number, and the like of each post may be obtained, which respectively represents the times of displaying, forwarding, commenting, and collecting the post in the browser, for the micro-blog platform, the ID of the public number may be analyzed according to the URL, and for the micro-blog platform, the user account number for posting may be analyzed.
Step 120: and respectively segmenting the contents of the posts by a segmentation model, and matching each word obtained after segmentation with a preset classification word packet to obtain the type of each word.
After the content of the posts is analyzed, word segmentation operation is started, namely the content of each analyzed post is input into a preset word segmentation model, word segmentation is carried out on the content text of each post through the word segmentation model, and each word segmented out is matched with a preset classification word packet while word segmentation is carried out or after word segmentation is finished, so that the type of each word is obtained.
In this embodiment, the word segmentation model may use a word segmentation model in the prior art, for example, use Ansj word segmenter, NLPIR big data semantic intelligent analysis platform, LTP language technology platform, stanford word segmenter, debtor, KCWS word segmenter, etc. In specific implementation, the open interface of the word segmentation model can be called to realize word segmentation. In the embodiment, the Hanlp word segmentation is used, the storage of the graph words is realized based on the fast offset, the word segmentation method is very fast, can be well suitable for fast word segmentation of a plurality of posts, and is strong in real-time performance.
In step 120, the classification word package defines which type a word belongs to, for example, the classification word package includes ten types of words in total, including a moisturizing type, a whitening type, a color type, and the like, where the words in the color type include: red, yellow, iridescent, and the like. The classified word packet is pre-stored in a database and can be provided by a user, in the process, the word can be matched with the classified word packet after being divided into one word, and if the word is rainbow, the type corresponding to the word is color; or after word segmentation is finished, matching each word with the words in the classified word packet in sequence.
After the word segmentation model carries out word segmentation on the content text of each post, each word obtained by word segmentation is inquired in a classification word packet, if the word is not found, the word is indicated to be not defined in the classification word packet in type, so that the type of the word is a null value, the word and the post which marks the word are appeared are stored in a database, and a one-to-one mapping relation is established between the word and the post; if the word is found, defining the type of the word in the classification word packet as the type of the word, storing the word, the type of the word and the post identifying the occurrence of the word in a database, and establishing a one-to-one mapping relation among the three.
Optionally, after obtaining the type of each word, a word segmentation list is displayed on the page, and the word segmentation list will display: each word which is segmented, the type corresponding to each word, and one or more of posts in which each word appears. After the word segmentation list is displayed, a user can enter the word segmentation list, and the word segmentation result is manually deleted and complemented. For example, in a post, there are some network terms, which are not a word in the actual semantics, but may be popular and valuable recently, so that it is desired to analyze some information related to the word, but the word is not segmented in the segmentation model, then a word supplement method may be adopted, the user inputs the word (which may be a single word) desired to be segmented by himself, and after confirmation, the segmentation model performs segmentation again according to the word input by the user. Certainly, the user can also delete some worthless segmented words by himself, for example, if the user wants to analyze the discussion of the user on the mask product in the third-party platform, some words (such as weather) irrelevant to the mask can be deleted directly, and the words of the weather will not be counted in the words when the subsequent statistical analysis is performed.
For another example, the word cut out by the word segmentation model is a seven-day ampoule, but the user only wants to analyze the seven-day ampoule, so that after entering the word segmentation list, the user can manually add a complementary word of the seven-day ampoule, and at the same time, the original seven-day ampoule can be deleted in the word segmentation list. Therefore, the problem that the word segmentation model cannot segment the network popular phrases can be well solved, and the actual requirements of users can be met.
Further, words in the classified word package may not completely cover all the segmented words in the word segmentation model, some words may not be matched in the classified word package, the obtained types are stored as null values, and the words will not have corresponding types in the displayed word segmentation list. Then, the type of the word may be manually set by the user, for example, the corresponding type is set in a drop-down box or an input box corresponding to the word, and the set type may be selected from existing types in the categorized word packet, or may be a type that is not found in a newly created categorized word packet. Optionally, after the user sets the corresponding type for the word, the word and the corresponding type may be stored, the classification word packet is updated, the word is added to the corresponding type in the classification word packet, or the word is added to a newly created type after a type is newly created in the classification word packet. If the classified bundle is used again later, the type of the word can be directly obtained without manual setting.
In a specific implementation process, after word segmentation is finished, the method displays a word segmentation result on a page in a word segmentation list, and after the word segmentation list is displayed, the method further comprises the following steps:
step A: and acquiring a new word added by the user based on the word segmentation list, and executing the step 120 again according to the added new word.
The user can manually add a new word to be analyzed based on the word segmentation result displayed by the word segmentation list. In the step A, firstly, a word to be analyzed added by a user is obtained, a word segmentation model is called again according to the added word, the added word is segmented out from a plurality of posts again through the word segmentation model, meanwhile, posts which appear in the added word are obtained, the added word and the posts which identify the appearing in the word are stored in a database, then, a word segmentation list is updated, and the word, the type corresponding to the word and the posts which appear in the word are displayed in the word segmentation list. The type of the word can be obtained by matching from the classified word package after the word segmentation of the word segmentation model is finished, and if the added word is not defined in the classified word package, the following step B can be executed after the word segmentation list is updated.
And B: and acquiring the type set by the user for the target word based on the word segmentation list, and updating the type of the target word.
The target words may refer to words of which the corresponding types are not matched in the classified word package when the first round of word segmentation is performed or when the nth round of word segmentation is performed after the word is complemented by the user, and/or words of which the types are considered to be unreasonable by the user. The user can manually classify the words which are not correctly classified or the words with null value in the type shown in the word segmentation list. And after the type manually set by the user is obtained, updating the word segmentation list, wherein in the updated word segmentation list, the type of the target word is updated to the type set by the user.
After the word segmentation result (including the word segmented out, the type of the word, and the post in which the word appears) is obtained, the plurality of posts are statistically analyzed based on the word segmentation result to obtain the discussion situation of different types of content in the plurality of posts, i.e., step 130 is performed. The word segmentation result may be the word segmentation result output in step 120, or the updated word segmentation result after the step a and/or the step B.
Step 130: and calculating the corresponding discussion occupation ratio of each type of word according to the appearance of each type of word in the posts.
The discussion percentage is used for representing the ratio of the discussion degree of the target post corresponding to each type to the discussion degrees of the posts, and the ratio of the discussion degrees of the posts can be the ratio of the browsing number, the forwarding number, the comment number or the collection number of the posts. Taking the browsing number as an example, in a specific calculation process, firstly, the posts appeared by each word are determined from a plurality of posts, and then the posts appeared by each type of word are counted, for example, 5 posts are totally needed to be analyzed, the words with the types of moisturizing types include a word a and a word B, wherein the word a appears in the 1 st and 3 rd posts, and the word B appears in the 3 rd and 4 th posts, so that the posts appeared by the words with the moisturizing types are the 1 st, 3 rd and 4 th posts. Then, the ratio of the browsing count of the posts appeared by each type of word to the total browsing count of the posts is calculated, and the discussion percentage corresponding to each type of word is obtained, for example, assuming that the browsing counts corresponding to 5 posts are respectively 2, 3, 1, and 4, the browsing count of the target post corresponding to the water replenishing type word is 2+3+1 ═ 6, and the total browsing count of 5 posts is 2+2+3+1+4 ═ 12, then the ratio of the discussion degree of the target post corresponding to the water replenishing type to the discussion degree of the posts is 6/12 ═ 50%, that is, the discussion percentage is 50%, which indicates that the ratio of the discussion water replenishing type topic in these 5 posts is 50%. The calculation mode of the discussion proportion of other types of words is the same as the above; the mode of calculating the discussion duty according to the forwarding number, the comment number or the collection number of the posts is the same as the mode.
In the process of calculating the discussion duty, a word only needs to pay attention to how many posts the word appears in, and the number of times the word appears in each post is not counted.
After the discussion proportion of each type of word is calculated, the content, browsing number, forwarding number, comment number, collection number and the like of each post and the discussion proportion of each type of word in a plurality of posts are displayed on a page. And a download key is arranged on the page, and after the download click operation of the user is acquired, all the information is packaged to generate an Excel table which is provided for the user.
Optionally, in the process of calculating the discussion duty, the calculation may be performed according to all the posts in the plurality of posts, or according to the selection of the user, a part of the posts may be selected to perform the calculation, or according to the browsing count, forwarding count, comment count, collection count, and the like of each post, the posts meeting the condition may be determined from the calculation. The discussion heat of a post is mainly the browsing times of watching the post, the forwarding number, the comment number and the like, if the post has only browsing times of several times or only has few comments or forwarding numbers, the discussion heat of the post is not very high, and the persuasion of the discussion duty is not very strong. Therefore, in the process of calculating the discussion proportion, the method can determine the posts with browsing number higher than a first threshold value, comment number higher than a second threshold value or forwarding number higher than a third threshold value in the plurality of posts, and calculate the discussion proportion according to the posts meeting the conditions.
Discussion dominance has a guiding meaning for each large brand merchant or business, who may desire to see the impact of different types of keywords on his brand or product during the operation. The above method may form a program product and provide a discussion duty acquisition platform to a user. The brander provides the URLs he wants to analyze and uploads to the platform, which may be posts of the bloggers or public numbers he is interested in comparing, such as beauty shop drafters' recommendations, etc., who want to be concerned with whether these bloggers have posts related to their products or do not mention their products. For example, a certain brand of company provides a mask product, and some beauty makeup workers are on trial, and through analysis of discussion proportions of various types, the effect of the mask product can be known, and what the discussion of more keywords is known, for example, the mask product is not really suitable for sensitive muscles, can not replenish water, whiten skin and the like, and what the proportion of different types of keywords is, and the discussion proportion can well explain the effect of the product and the emphasis of discussion. Of course, when the word segmentation model is used for word segmentation, the segmented words are not only recognition words, but also depreciation words, such as allergy, etc., and when the discussion ratio of words of the allergy type exceeds a certain threshold value, the brand owner needs to detect and improve the facial mask product in time.
In summary, the method for obtaining the discussion occupation ratio provided in the embodiment of the present application utilizes the technologies of URL parsing, word segmentation model, word package matching, word supplement, and the like, and only needs to provide a plurality of URLs to be analyzed, so that automatic parsing, word segmentation, type matching, and statistical analysis can be performed, the specific occupation ratio for discussing different topic contents in a plurality of posts can be quickly obtained, and the whole analysis process is more efficient and convenient.
Based on the same inventive concept, an obtaining apparatus for discussion occupation ratio is further provided in the embodiments of the present application, and fig. 2 shows a schematic diagram of the apparatus, where the apparatus includes:
an obtaining module 210, configured to obtain contents of a plurality of posts;
the word segmentation module 220 is configured to perform word segmentation on the content of the posts through a word segmentation model, and match each word obtained after word segmentation with a preset classification word packet to obtain a type of each word, where the classification word packet defines types to which different words belong;
an analysis module 230, configured to calculate, according to occurrence of each type of word in the plurality of posts, a discussion duty corresponding to each type of word, where the discussion duty is used to represent a ratio of a discussion degree of a target post corresponding to each type to a discussion degree of the plurality of posts.
Optionally, the discussion degree includes a browsing number of the posts, and the analysis module 230 is specifically configured to: determining posts appeared by each type of words from the posts, and obtaining target posts corresponding to each type of words; and calculating the ratio of the browsing number of the target posts to the total browsing number of the posts to obtain the discussion percentage corresponding to each type of words.
Optionally, the device further includes a display module and an update module, where the display module is configured to display a word segmentation list on the page, where the word segmentation list includes words obtained by segmenting the content of each post and a type corresponding to each word; the updating module is used for acquiring new words added by the user based on the word segmentation list, executing a word segmentation process on the contents of the posts through a word segmentation model according to the new words through the word segmentation module, and updating words obtained after word segmentation.
Optionally, the updating module is further configured to acquire a type set by the user for the target word based on the word segmentation list, and update the type of the target word to the set type.
Optionally, the obtaining module 210 is specifically configured to: and acquiring a plurality of Uniform Resource Locators (URLs), and analyzing each URL to acquire the content of the posts.
The basic principle and the resulting technical effect of the above-mentioned discussion ratio obtaining apparatus are the same as those of the previous method embodiment, and for the sake of brief description, no part of this embodiment is mentioned, and reference may be made to the corresponding contents in the above-mentioned method embodiment, which is not described herein again.
Fig. 3 shows a possible structure of an electronic device 300 provided in an embodiment of the present application. Referring to fig. 3, the electronic device 300 includes: a processor 310, a memory 320, and a communication interface 330, which are interconnected and in communication with each other via a communication bus 340 and/or other form of connection mechanism (not shown).
The Memory 320 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The processor 310, as well as possibly other components, may access, read, and/or write data to the memory 320.
The processor 310 includes one or more (only one shown) which may be an integrated circuit chip having signal processing capabilities. The Processor 310 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; or a special-purpose Processor, including a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component.
Communication interface 330 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for the purpose of data interaction. Communication interface 330 may be an ethernet interface; may be a mobile communications network interface, such as an interface for a 3G, 4G, 5G network; or may be other types of interfaces having data transceiving functions.
One or more computer program instructions may be stored in memory 320 and read and executed by processor 310 to implement the steps of the method for obtaining discussion ratios provided by the embodiments of the present application, as well as other desired functions.
It will be appreciated that the configuration shown in fig. 3 is merely illustrative and that electronic device 300 may include more or fewer components than shown in fig. 3 or have a different configuration than shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof. In the embodiment of the present application, the electronic device 300 may obtain the contents of the multiple posts, perform word segmentation and word type matching on the contents of the multiple posts, and perform statistical analysis according to different types of words to obtain the discussion percentage of different types of contents in the multiple posts.
The embodiments of the present application also provide a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the steps of the method for obtaining discussion ratios provided in the embodiments of the present application are executed. The computer-readable storage medium may be implemented as, for example, memory 320 in electronic device 300 in fig. 3.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for obtaining a discussion percentage, comprising:
acquiring the contents of a plurality of posts;
respectively segmenting the content of the posts by a segmentation model, and matching each word obtained after segmentation with a preset classification word packet to obtain the type of each word, wherein the classification word packet defines the types of different words;
calculating the discussion occupation ratio corresponding to each type of words according to the occurrence condition of each type of words in the posts, wherein the discussion occupation ratio is used for representing the ratio of the discussion degree of the target post corresponding to each type to the discussion degree of the posts.
2. The method of claim 1, wherein the discussion level comprises a browsing count of posts, and wherein calculating the discussion percentage corresponding to each type of word according to the occurrence of each type of word in the plurality of posts comprises:
determining posts appeared by each type of words from the posts, and obtaining target posts corresponding to each type of words;
and calculating the ratio of the browsing number of the target posts to the total browsing number of the posts to obtain the discussion percentage corresponding to each type of words.
3. The method of claim 1, wherein after matching each word obtained after the word segmentation with a preset classification word package to obtain a type of each word, the method further comprises:
displaying a word segmentation list on a page, wherein the word segmentation list comprises words obtained by segmenting the content of each post and the type corresponding to each word;
and acquiring new words added by the user based on the word segmentation list, executing a word segmentation process on the contents of the posts through a word segmentation model according to the new words, and updating the words obtained after word segmentation.
4. The method of claim 3, wherein after presenting the list of participles on the page, the method further comprises:
and acquiring the type set by the user for the target word based on the word segmentation list, and updating the type of the target word to the set type.
5. The method of claim 1, wherein obtaining the content of the plurality of posts comprises: and acquiring a plurality of Uniform Resource Locators (URLs), and analyzing each URL to acquire the content of the posts.
6. A discussion proportion acquisition apparatus, comprising:
the acquisition module is used for acquiring the contents of a plurality of posts;
the word segmentation module is used for segmenting the content of the posts through a word segmentation model respectively, and matching each word obtained after word segmentation with a preset classification word packet to obtain the type of each word, wherein the classification word packet defines the types of different words;
the analysis module is used for calculating the discussion duty ratio corresponding to each type of word according to the appearance of each type of word in the posts, and the discussion duty ratio is used for expressing the ratio of the discussion degree of the target post corresponding to each type to the discussion degree of the posts.
7. The apparatus of claim 6, wherein the discussion level comprises a browsing count of posts, and wherein the analysis module is further configured to:
determining posts appeared by each type of words from the posts, and obtaining target posts corresponding to each type of words;
and calculating the ratio of the browsing number of the target posts to the total browsing number of the posts to obtain the discussion percentage corresponding to each type of words.
8. The apparatus of claim 6, further comprising a presentation module and an update module, wherein,
the display module is used for displaying a word segmentation list on a page, wherein the word segmentation list comprises words obtained by segmenting the content of each post and the type corresponding to each word;
the updating module is used for acquiring new words added by the user based on the word segmentation list, executing a word segmentation process on the contents of the posts through a word segmentation model according to the new words through the word segmentation module, and updating words obtained after word segmentation.
9. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method according to any one of claims 1-5.
10. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of any of claims 1-5.
CN201911218626.9A 2019-12-02 2019-12-02 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment Active CN110990571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911218626.9A CN110990571B (en) 2019-12-02 2019-12-02 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911218626.9A CN110990571B (en) 2019-12-02 2019-12-02 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110990571A true CN110990571A (en) 2020-04-10
CN110990571B CN110990571B (en) 2024-04-02

Family

ID=70089480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911218626.9A Active CN110990571B (en) 2019-12-02 2019-12-02 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110990571B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611388A (en) * 2020-05-29 2020-09-01 北京学之途网络科技有限公司 Account classification method, device and equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007064874A2 (en) * 2005-12-01 2007-06-07 Adchemy, Inc. Method and apparatus for representing text using search engine, document collection, and hierarchal taxonomy
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
CN103235824A (en) * 2013-05-06 2013-08-07 上海河广信息科技有限公司 Method and system for determining web page texts users interested in according to browsed web pages
CN103324645A (en) * 2012-03-23 2013-09-25 腾讯科技(深圳)有限公司 Method and device for recommending webpage
CN104657349A (en) * 2015-02-11 2015-05-27 厦门美柚信息科技有限公司 Forum post feature identifying method and device
CN104915327A (en) * 2014-03-14 2015-09-16 腾讯科技(深圳)有限公司 Text information processing method and device
CN106503209A (en) * 2016-10-26 2017-03-15 Tcl集团股份有限公司 A kind of topic temperature Forecasting Methodology and system
CN108170692A (en) * 2016-12-07 2018-06-15 腾讯科技(深圳)有限公司 A kind of focus incident information processing method and device
CN109033286A (en) * 2018-07-12 2018-12-18 北京猫眼文化传媒有限公司 Data statistical approach and device
US20180374377A1 (en) * 2011-03-22 2018-12-27 East Carolina University Methods, systems, and computer program products for normalization and cumulative analysis of cognitive post content
CN109408818A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 New word identification method, device, computer equipment and storage medium
CN109740059A (en) * 2018-12-31 2019-05-10 杭州翼兔网络科技有限公司 A kind of hot topic the analysis of public opinion method
CN110222182A (en) * 2019-06-06 2019-09-10 腾讯科技(深圳)有限公司 A kind of statement classification method and relevant device
CN110347900A (en) * 2019-07-10 2019-10-18 腾讯科技(深圳)有限公司 A kind of importance calculation method of keyword, device, server and medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007064874A2 (en) * 2005-12-01 2007-06-07 Adchemy, Inc. Method and apparatus for representing text using search engine, document collection, and hierarchal taxonomy
US20180374377A1 (en) * 2011-03-22 2018-12-27 East Carolina University Methods, systems, and computer program products for normalization and cumulative analysis of cognitive post content
CN103324645A (en) * 2012-03-23 2013-09-25 腾讯科技(深圳)有限公司 Method and device for recommending webpage
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
CN103235824A (en) * 2013-05-06 2013-08-07 上海河广信息科技有限公司 Method and system for determining web page texts users interested in according to browsed web pages
CN104915327A (en) * 2014-03-14 2015-09-16 腾讯科技(深圳)有限公司 Text information processing method and device
CN104657349A (en) * 2015-02-11 2015-05-27 厦门美柚信息科技有限公司 Forum post feature identifying method and device
CN106503209A (en) * 2016-10-26 2017-03-15 Tcl集团股份有限公司 A kind of topic temperature Forecasting Methodology and system
CN108170692A (en) * 2016-12-07 2018-06-15 腾讯科技(深圳)有限公司 A kind of focus incident information processing method and device
CN109033286A (en) * 2018-07-12 2018-12-18 北京猫眼文化传媒有限公司 Data statistical approach and device
CN109408818A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 New word identification method, device, computer equipment and storage medium
CN109740059A (en) * 2018-12-31 2019-05-10 杭州翼兔网络科技有限公司 A kind of hot topic the analysis of public opinion method
CN110222182A (en) * 2019-06-06 2019-09-10 腾讯科技(深圳)有限公司 A kind of statement classification method and relevant device
CN110347900A (en) * 2019-07-10 2019-10-18 腾讯科技(深圳)有限公司 A kind of importance calculation method of keyword, device, server and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611388A (en) * 2020-05-29 2020-09-01 北京学之途网络科技有限公司 Account classification method, device and equipment

Also Published As

Publication number Publication date
CN110990571B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
AU2017243270B2 (en) Method and device for extracting core words from commodity short text
CN103365867B (en) It is a kind of that the method and apparatus for carrying out sentiment analysis are evaluated to user
CN109767318A (en) Loan product recommended method, device, equipment and storage medium
WO2018000998A1 (en) Interface generation method, apparatus and system
CN110263248A (en) A kind of information-pushing method, device, storage medium and server
CN105893344A (en) User semantic sentiment analysis-based response method and device
CN111212303B (en) Video recommendation method, server and computer-readable storage medium
CN102279887B (en) A kind of Document Classification Method, Apparatus and system
KR20160055930A (en) Systems and methods for actively composing content for use in continuous social communication
CN106557410B (en) User behavior analysis method and apparatus based on artificial intelligence
CN105843796A (en) Microblog emotional tendency analysis method and device
CN112948575B (en) Text data processing method, apparatus and computer readable storage medium
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
CN104077417A (en) Figure tag recommendation method and system in social network
WO2015062359A1 (en) Method and device for advertisement classification, server and storage medium
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN112699295A (en) Webpage content recommendation method and device and computer readable storage medium
CN110134844A (en) Subdivision field public sentiment monitoring method, device, computer equipment and storage medium
CN111104590A (en) Information recommendation method, device, medium and electronic equipment
CN108763202A (en) Method, apparatus, equipment and the readable storage medium storing program for executing of the sensitive text of identification
CN112035748A (en) Information recommendation method and device, electronic equipment and storage medium
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN107273546A (en) Counterfeit application detection method and system
CN112328857A (en) Product knowledge aggregation method and device, computer equipment and storage medium
CN104881447A (en) Searching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201221

Address after: A108, 1 / F, curling hall, winter training center, 68 Shijingshan Road, Shijingshan District, Beijing 100041

Applicant after: Beijing second hand Artificial Intelligence Technology Co.,Ltd.

Address before: Room 9014, 9 / F, building 3, yard 30, Shixing street, Shijingshan District, Beijing

Applicant before: ADMASTER TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant