CN114780695A - Big data mining method and big data mining system for online topics - Google Patents

Big data mining method and big data mining system for online topics Download PDF

Info

Publication number
CN114780695A
CN114780695A CN202210371213.XA CN202210371213A CN114780695A CN 114780695 A CN114780695 A CN 114780695A CN 202210371213 A CN202210371213 A CN 202210371213A CN 114780695 A CN114780695 A CN 114780695A
Authority
CN
China
Prior art keywords
topic
phrase
topic interest
interest
interest phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210371213.XA
Other languages
Chinese (zh)
Inventor
杨忠哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202210371213.XA priority Critical patent/CN114780695A/en
Publication of CN114780695A publication Critical patent/CN114780695A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a big data mining method and a big data mining system for online topics; in view of the fact that the specified topic interest mining model is configured on the basis of the lightweight configuration rule, the specified topic interest mining model is used for mining the online topic big data to be subjected to interest mining, so that on one hand, user interest knowledge distribution can be obtained quickly, timeliness of user interest mining is improved, on the other hand, accuracy and integrity of user interest knowledge distribution can be improved, and quality of user interest mining is improved. In conclusion, by means of the specified topic interest mining model and the lightweight configuration, the efficiency and the quality of the user interest knowledge distribution can be improved.

Description

Big data mining method and big data mining system for online topics
Technical Field
The application relates to the technical field of big data, in particular to a big data mining method and a big data mining system for online topics.
Background
Online topic analysis is one of the application branches of Natural Language Processing (NLP), which aims to obtain valuable data assets by big data mining on the user's social platform topics. The relevant online topic analysis is usually realized based on an AI model, but through intensive research and analysis of the inventor, it is found that, on one hand, it is difficult for a traditional AI model to guarantee timeliness of topic big data mining, and on the other hand, it is difficult to guarantee accuracy of topic big data mining, and therefore, how to effectively improve the above problem is a current difficulty.
Disclosure of Invention
An object of the present application is to provide a big data mining method and a big data mining system for online topics.
The technical scheme of the application is realized by at least some of the following embodiments.
The embodiment of the application provides a big data mining method for an online topic, which is applied to a big data mining system in communication connection with a topic activity platform system, and the method at least comprises the following steps: when a user interest mining request sent by the topic activity platform system is received, calling online topic big data to be subjected to interest mining from a set relational database corresponding to the topic activity platform system by using the user interest mining request; transmitting the online topic big data to be subjected to interest mining to a specified topic interest mining model, and obtaining user interest knowledge distribution of the online topic big data to be subjected to interest mining through the specified topic interest mining model; the specified topic interest mining model is configured based on a lightweight configuration rule.
Based on the embodiment of the application, in view of the fact that the specified topic interest mining model is configured based on the lightweight configuration rule, the specified topic interest mining model is used for mining the online topic big data to be subjected to interest mining, so that on one hand, the user interest knowledge distribution can be quickly obtained, the timeliness of user interest mining is improved, on the other hand, the accuracy and the integrity of the user interest knowledge distribution can be improved, and the quality of user interest mining is improved. In conclusion, by means of the specified topic interest mining model and the lightweight configuration, the efficiency and the quality of the user interest knowledge distribution can be improved.
In a separately implementable embodiment, the specified topic interest mining model is configured as follows: acquiring big data of the authenticated online topic and determining the prior basis of the big data of the authenticated online topic; transmitting the authenticated online topic big data to a lightweight topic interest phrase extraction node of a basic topic interest mining model, and determining target topic interest phrase distribution corresponding to the authenticated online topic big data; the lightweight topic interest phrase extraction node comprises a dimension index optimization variable to be configured; distributing and disassembling the target topic interest phrase into a plurality of interactive topic interest phrase sets according to a set scale, and distributing and transmitting the target topic interest phrase to a topic interest phrase sorting node, wherein the topic interest phrase sorting node comprises a plurality of light-weight topic interest phrase processing sub-nodes, and each light-weight topic interest phrase processing sub-node is used for carrying out topic interest phrase sorting and potential topic interest phrase mining on the interactive topic interest phrase sets; and configuring the basic topic interest mining model by utilizing the topic interest phrase emotion field distribution generated by the topic interest phrase sorting node and the prior basis of the authenticated online topic big data.
Based on the embodiment of the application, the lightweight topic interest phrase extraction node of the basic topic interest mining model comprises a dimension index optimization variable to be configured, the dimension index optimization variable can perform dimension index optimization on topic interest phrase distribution1 of authenticated online topic big data extracted by the lightweight topic interest phrase extraction node, and the design is that lightweight upgrade is performed on topic interest phrase distribution1 of which the dimension index is optimized, so that two-level lightweight judgment indexes can be flexibly positioned, the sampling quality difference between lightweight phrase sampling processing and non-lightweight phrase sampling processing is reduced, in addition, the topic interest phrase sorting node can perform topic phrase sorting and potential topic interest phrase mining on a plurality of interactive topic interest phrase sets, and the generated topic interest phrase emotion field distribution can fully consider the topic interests in different stages, the mining accuracy and the integrity of the topic interest mining model configured based on topic interest phrase emotion field distribution and prior basis can be improved, and the lightweight topic interest phrase extraction node and the topic interest phrase sorting node are subjected to lightweight processing, so that the minimization of the model architecture of the specified topic interest mining model can be realized, the mining quality can be guaranteed, and the overhead of additional computing resources can be reduced.
In an independently implementable embodiment, the transmitting the authenticated online topic big data to a lightweight topic interest phrase extraction node of an underlying topic interest mining model, determining a target topic interest phrase distribution corresponding to the authenticated online topic big data, includes: performing topic interest phrase summarization operation on the authenticated online topic big data, and determining topic interest phrase distribution1 corresponding to the authenticated online topic big data; performing first optimization on the dimension indexes of the topic interest phrase distribution1 by using the dimension index optimization variables, and determining topic interest phrase distribution2 which completes optimization; performing lightweight updating operation by using the authenticated online topic big data and the topic interest phrase distribution2 to determine a topic interest phrase distribution 3; and performing phrase sampling processing on the topic interest phrase distribution3 to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data.
Based on the embodiment of the application, the dimension index optimization variable is flexibly adjustable (configurable), so that the sampling difference between the light weight phrase sampling process and the non-light weight phrase sampling process can be reduced.
In an independently implementable embodiment, the determining topic interest phrase distribution3 using the authenticated online topic big data and the topic interest phrase distribution2 for lightweight update operations comprises: determining the topic interest phrase distribution3 based on a first determination index of a specified trigger mechanism and a quantitative comparison result of the authenticated online topic big data and a corresponding dimension index of the topic interest phrase distribution 2.
In an independently implementable embodiment, the performing phrase sampling processing on the topic interest phrase distribution3 to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data includes: performing phrase sampling processing on the topic interest phrase distribution3 to determine a topic interest phrase distribution 4; performing second optimization on the phrase description values of the topic interest phrase distribution1 by using the dimension index optimization variables, and determining topic interest phrase distribution5 which completes optimization; and performing topic interest phrase sorting on the topic interest phrase distribution4 and the topic interest phrase distribution5 to determine the target topic interest phrase distribution.
Based on the embodiment of the present application, in order to reduce the interference of bipolar simplification processing on the accuracy and detail content of the extracted topic interest phrases as much as possible, the topic interest phrases of the original authenticated online topic big data may be added in the topic interest phrase distribution4, for example, the topic interest phrase distribution5 and the topic interest phrase distribution4 that perform the second optimization on the topic interest phrase distribution1 are sorted.
In an independently implementable embodiment, the derived information of the U-th lightweight topic interest phrase processing sub-node in the topic interest phrase collating node is raw material information of the U + 1-th lightweight topic interest phrase processing sub-node, the raw material information of the first lightweight topic interest phrase processing sub-node is the target topic interest phrase distribution, the derived information of the last lightweight topic interest phrase processing sub-node is the topic interest phrase emotion field distribution, and U is a positive integer.
Based on the embodiment of the application, potential interest phrase sampling processing is carried out through the lightweight topic interest phrase processing sub-node, and topic interest phrase emotion fields can be deeply and finely mined as far as possible, so that the quality of interest mining is guaranteed.
In an independently implementable embodiment, for one of the lightweight topic interest phrase processing sub-nodes, the lightweight topic interest phrase processing sub-node is configured to perform topic interest phrase sorting and potential topic interest phrase mining on the interactive topic interest phrase sets of the raw material type topic interest phrase distribution transmitted to the lightweight topic interest phrase processing sub-node based on: performing lightweight updating operation on the raw material type topic interest phrase distribution, and determining topic interest phrase distribution 6; based on not less than one mapping indication, performing phrase set mapping on the interactive topic interest phrase sets of the topic interest phrase distribution6 to obtain a mapped topic interest phrase distribution; respectively carrying out phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6, and then carrying out topic interest phrase sorting on the mapped topic interest phrase distribution and the raw material type topic interest phrase distribution to obtain sorted topic interest phrase distribution; and processing the sorted topic interest phrase distribution to obtain a topic interest phrase distribution generation result of the light weight topic interest phrase processing sub-node.
Based on the embodiment of the application, the current interactive topic interest phrase set can be spliced into the overall topic interest phrase through the overall mapping, the current interactive topic interest phrase set can be spliced into the staged topic interest phrase through the staged mapping, and further the distribution of the topic interest phrase can be ensured to fully consider the overall stage of the interest phrase, so that the integrity and richness of the organized topic interest phrase can be ensured.
In an independently implementable embodiment, for one of the mapping indications, the phrase set mapping the interactive topic interest phrase set of the topic interest phrase distribution6 to obtain a mapped topic interest phrase distribution includes: for one interactive topic interest phrase set, determining an interactive topic interest phrase set to be mapped, corresponding to the interactive topic interest phrase set, in the topic interest phrase distribution6 based on the mapping indication; and determining variable data of the interactive topic interest phrase set in each dimension after phrase set mapping based on the variable data of the interactive topic interest phrase set to be mapped in the corresponding dimension.
Based on the embodiment of the application, the distribution content richness of the interest phrases of the mapped topics can be guaranteed through mapping processing analysis of different dimension levels.
In an embodiment that can be implemented independently, the performing of phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6 respectively and performing topic interest phrase sorting with the raw material type topic interest phrase distribution to obtain a sorted topic interest phrase distribution includes: performing phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6 based on a lightweight topic interest phrase processing thread, respectively, and determining a plurality of potential topic interest phrase distributions; and after the dimension reduction and light weight operation are carried out on the plurality of potential topic interest phrase distributions, carrying out topic interest phrase arrangement on the potential topic interest phrase distributions and the raw material type topic interest phrase distributions to obtain the arranged topic interest phrase distributions.
Based on the embodiment of the application, through the dimensionless light-weight operation (normalization processing), the simplification degree of the distribution of the interest phrases of the sorted topics can be ensured to a certain extent, so that the operation resource overhead of the related analysis processing is reduced.
In an independently implementable embodiment, the configuring the underlying topic interest mining model with the topic interest phrase sentiment field distribution generated by the topic interest phrase collating node and the prior basis of the authenticated online topic big data comprises: obtaining a target migration learning model corresponding to the basic topic interest mining model; and configuring the basic topic interest mining model by utilizing topic interest phrase emotion field distribution generated by the topic interest phrase sorting node, the prior basis of the authenticated online topic big data and the target migration learning model.
Based on the embodiment of the application, the obtained target migration learning model can be configured, and the learning expectation of the target migration learning model can be consistent with the learning expectation of the basic topic interest mining model. In view of the fact that the model variables of the target migration learning model are complete, the model accuracy of the target migration learning model is higher than that of the basic topic interest mining model, and the model accuracy of the basic topic interest mining model can be improved by adaptively configuring the basic topic interest mining model through the target migration learning model.
In an independently implementable embodiment, the configuring the base topic interest mining model with the topic interest phrase sentiment field distribution generated by the topic interest phrasing node, the prior basis of the authenticated online topic big data, and the target transfer learning model includes: determining a first model performance evaluation parameter by using the topic interest phrase emotion field distribution and first transfer learning information of the target transfer learning model on the authenticated online topic big data; determining a second model performance evaluation parameter by using the topic interest phrase emotion field distribution and the prior basis of the authenticated online topic big data; and configuring the basic topic interest mining model by using the first model performance evaluation parameter and the second model performance evaluation parameter.
In an independently implementable embodiment, the determining a first model performance evaluation parameter using the topic interest phrase emotion field distribution and the first transfer learning information of the target transfer learning model for the authenticated online topic big data comprises: determining second transfer learning information of the basic topic interest mining model based on configuration sample balancing nodes and the topic interest phrase emotion field distribution; wherein the variable list of the configuration sample balance nodes is consistent with the variables of the target migration learning model; and determining the first model performance evaluation parameter by using the first transfer learning information and the second transfer learning information.
Based on the embodiment of the present application, the effect that the variable list of the configuration sample balance node is consistent with the variable list of the target transfer learning model may be: and ensuring that the integrity description of the target migration learning model is inherited through the configuration sample balance node.
In an independently implementable embodiment, the determining a second model performance evaluation parameter using the topic interest phrase sentiment field distribution and a priori basis of the authenticated online topic big data comprises: determining third transfer learning information of the basic topic interest mining model based on a significance improvement node and the topic interest phrase emotion field distribution; and determining the performance evaluation parameters of the second model by using the third transfer learning information and the prior basis of the authenticated online topic big data.
In a separately implementable embodiment, after the base topic interest mining model configuration is complete, the method further comprises: and generating a user demand analysis node based on the configured configuration sample balancing node and the significance improvement node, wherein the user demand analysis node is used for determining a user demand analysis report based on the derived information of the topic interest phrase sorting node of the configured basic topic interest mining model when user demand analysis is carried out.
Based on the embodiment of the application, the user demand analysis node can be generated quickly, so that unnecessary operation is reduced, and the analysis accuracy and the reliability of the user demand analysis node can be guaranteed.
A big data mining system, comprising: a memory for storing an executable computer program, a processor for implementing the above method when executing the executable computer program stored in the memory.
A computer-readable storage medium, on which a computer program is stored which, when executed, performs the above-described method.
Drawings
FIG. 1 is a schematic diagram illustrating a big data mining system in which embodiments of the present application may be implemented.
FIG. 2 is a flow diagram illustrating a big data mining method for online topics in which embodiments of the present application may be implemented.
Fig. 3 is an architectural diagram illustrating an application environment in which a big data mining method for online topics can be implemented according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application. In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Fig. 1 is a block diagram illustrating one communication configuration of a big data mining system 100 that may implement embodiments of the present application, the big data mining system 100 including a memory 101 for storing an executable computer program, a processor 102 for implementing a big data mining method for online topics in embodiments of the present application when executing the executable computer program stored in the memory 101.
Fig. 2 is a flowchart illustrating a big data mining method for an online topic, which may implement an embodiment of the present application, where the big data mining method for an online topic may be implemented by the big data mining system 100 shown in fig. 1, and further may include the technical solutions described in the following related steps.
STEP101, when receiving a user interest mining request sent by the topic activity platform system, using the user interest mining request to call online topic big data to be interest mined from a set relational database corresponding to the topic activity platform system.
In the embodiment of the present application, the topic activity platform system may be different types of platform systems, such as a cloud service system, a social media system, a user evaluation system, and the like. The user interest mining request may be an auxiliary application sent by the topic activity platform system to the big data mining system for user interest mining, and based on this, the big data mining system may invoke corresponding data to be mined (such as the above online topic big data to be subjected to interest mining) from a set relational database (such as a MySQL database) corresponding to the topic activity platform system.
Further, the online topic big data may be user social big data, user conversation big data, user comment big data, and the like, which are not limited herein. In addition, the field related to the online topic big data can be e-commerce, remote education, digital office, intelligent education, cloud games, VR/AR/MR and the like.
The STEP102 transmits the online topic big data to be subjected to interest mining to a specified topic interest mining model, and obtains user interest knowledge distribution of the online topic big data to be subjected to interest mining through the specified topic interest mining model.
In the embodiment of the application, the specified topic interest mining model is configured based on the lightweight configuration rule, and the lightweight configuration can reduce the scale of the model as much as possible on the premise of ensuring the performance of the model, so that the timeliness and the performance quality of the subsequent model are improved.
Further, the user interest knowledge distribution may reflect the interest points or interest tendencies of the user, and may be recorded in a text form, a knowledge graph/knowledge base form, or an interest feature relationship network form, which is not limited herein. In addition, the user interest knowledge distribution has a rich use value, for example, the user interest knowledge distribution not only can be used as a basis for subsequent information recommendation, but also can be used as a decision guide for subsequent service upgrade.
By applying the STEP101 and the STEP102, in view of the fact that the specified topic interest mining model is configured based on the lightweight configuration rule, mining large data of the online topic to be subjected to interest mining through the specified topic interest mining model can quickly obtain user interest knowledge distribution and improve timeliness of user interest mining on one hand, and on the other hand, accuracy and integrity of the user interest knowledge distribution can be improved and quality of user interest mining can be improved. In conclusion, by means of the specified topic interest mining model and the lightweight configuration, the efficiency and the quality of the user interest knowledge distribution can be improved.
Under some independent design ideas, the configuration mode of the specified topic interest mining model can comprise the contents described by STEPs 201-STEPs 204.
STEP201, collecting big data of the authenticated online topic and determining the prior basis of the big data of the authenticated online topic.
STEP202, transmitting the authenticated online topic big data to a lightweight topic interest phrase extraction node of a basic topic interest mining model, and determining target topic interest phrase distribution corresponding to the authenticated online topic big data.
In the embodiment of the application, the lightweight topic interest phrase extraction node includes a dimension index optimization variable to be configured, and the dimension index optimization variable may be understood as a dimension adjustment parameter. In addition, the light topic interest phrase extraction node can be understood as a feature mining node/feature mining layer with a binary characteristic, and the binary characteristic can effectively reduce the structural complexity of the feature mining node. Further, the basic topic interest mining model may be understood as a topic mining model to be configured. The target topic interest phrase distribution can be understood as a target topic interest point set or a target topic interest tendency set, and further, the target topic interest phrase distribution can be recorded in the form of an interest phrase map or in the form of an interest phrase relationship network.
STEP203, decomposing the target topic interest phrase distribution into a plurality of interactive topic interest phrase sets according to a set scale, and transmitting the target topic interest phrase distribution to a topic interest phrase sorting node.
In an embodiment of the application, the topic interest phrase sorting node includes a plurality of lightweight topic interest phrase processing sub-nodes, each topic interest phrase processing sub-node is configured to perform topic interest phrase sorting and potential topic interest phrase mining on the interactive topic interest phrase set, and further, the interactive topic interest phrase set may be understood as a topic interest tag. Topic interest phrase sorting may be, for example, topic interest phrase fusion/topic interest phrase merging. Potential topic interest phrase mining may be understood as hidden interest phrase extraction or deep level interest phrase extraction. In addition, the topic interest phrase sorting node can be understood as a topic interest phrase fusion unit or a topic interest phrase merging network. The lightweight topic interest phrase processing sub-node may also be understood as a processing sub-node/phrase processing layer having a bi-categorical characteristic. A set size may be understood as a set size or a set number.
STEP204, configuring the basic topic interest mining model by using the emotion field distribution of the topic interest phrases generated by the topic interest phrase sorting node and the prior basis of the authenticated online topic big data.
In the embodiment of the application, target topic interest phrase distribution can be transmitted to a topic interest phrase sorting node for interest phrase fusion operation, so as to obtain topic interest phrase emotion field distribution, the topic interest phrase emotion field distribution can be understood as a set corresponding to emotion characteristics of a topic interest phrase, and a priori basis can be understood as annotation information or annotation information.
What has been described in STEP201-STEP204 is exemplified and explained below, but it should not be understood that the technical features necessary for implementing the above-described embodiments are necessary, in other words, the technical features can be completely and clearly implemented by those skilled in the art on the basis of the above-described contents.
For STEP201, the authenticated online topic big data can be sample online topic big data whose a priori basis can be a user interest knowledge distribution of the authenticated online topic big data that was previously annotated.
For STEP202, when the authenticated online topic big data is transmitted to the lightweight topic interest phrase extraction node of the basic topic interest mining model, and the target topic interest phrase distribution corresponding to the authenticated online topic big data is determined, the content recorded by STEP301-STEP304 may be exemplarily included as follows.
STEP301, performing a topic interest phrase summarization operation on the authenticated online topic big data, and determining a topic interest phrase distribution1 corresponding to the authenticated online topic big data.
STEP302, performing first optimization on the dimension index of topic interest phrase distribution1 by using the dimension index optimization variable, and determining topic interest phrase distribution2 completing optimization.
The STEP303 performs a lightweight update operation (such as binarization update processing) using the authenticated online topic big data and the topic interest phrase distribution2, and determines a topic interest phrase distribution 3.
STEP304, performing phrase sampling processing on the topic interest phrase distribution3, and determining a target topic interest phrase distribution corresponding to the authenticated online topic big data.
It can be understood that, for STEP301, after performing a topic interest phrase summarizing operation (such as a global tie pooling operation) on the authenticated online topic big data, a feature relationship network corresponding to the authenticated online topic big data can be obtained, in other words, the topic interest phrase distribution1 is a basic feature relationship network of the authenticated online topic big data.
In STEP302, optimizing the dimension index of the topic interest phrase distribution1 through the dimension index optimization variable may be understood as optimizing the basic feature relationship network.
In some embodiments, the dimension indicator optimization variables may include a first optimization variable, a second optimization variable, and a third optimization variable, and when the dimension indicator of the topic interest phrase distribution1 is first optimized by using the dimension indicator optimization variables, the dimension indicator of the topic interest phrase distribution1 may be optimized by the first optimization variable and the second optimization variable, for example, variable data in each dimension of the topic interest phrase distribution1 may be processed with the first optimization variable and the second optimization variable. The third optimization variable is used for performing a second optimization on the dimension index of the topic interest phrase distribution 1.
Further, the dimension index optimization variables are flexibly adjustable (e.g., configurable), thereby causing the feature relationship network for completing the optimization to be also flexibly adjustable, in other words, the big data of the certified online topic can be represented by the flexibly adjustable feature relationship network.
In STEP303, the determining topic interest phrase distribution3 by performing a lightweight update operation (e.g., bipolar simplification/binary classification) using the authenticated online topic big data and the topic interest phrase distribution2 may be determining topic interest phrase distribution3 based on a first determination index specifying a trigger mechanism and a quantized comparison result (e.g., a difference value) of the authenticated online topic big data and a corresponding dimension index of the topic interest phrase distribution 2. Further, the specified trigger mechanism may be understood as a preset trigger function.
In some embodiments, a quantitative comparison result of the dimension index corresponding to the topic interest phrase distribution2 and the authenticated online topic big data may be determined, if the dimension index of the authenticated online topic big data is greater than the dimension index of the topic interest phrase distribution2, the dimension index is adjusted to be increased by one, and if the dimension index of the authenticated online topic big data is not greater than the dimension index of the topic interest phrase distribution2, the dimension index is adjusted to be decreased by one.
In the embodiment of the present application, although the first decision index of the trigger mechanism may be kept unchanged, in view of optimizing the dimension index of the topic interest phrase distribution1 through STEP302 before performing weight reduction upgrade based on the trigger mechanism (RELU), the first decision index is actually equivalent to the flexibly adjustable first decision index that optimizes the trigger mechanism.
In the STEP304, when performing phrase sampling processing on the topic interest phrase distribution3 to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data, phrase sampling processing may be performed on the topic interest phrase distribution3 to determine a topic interest phrase distribution 4; secondly, performing second optimization on the phrase description values of the topic interest phrase distribution1 by using the dimension index optimization variables, and determining topic interest phrase distribution5 which completes optimization; then, topic interest phrase sorting is performed on the topic interest phrase distribution4 and the topic interest phrase distribution5, and the target topic interest phrase distribution is determined.
In this embodiment, when performing the second optimization on the phrase description values of the topic interest phrase distribution1 by using the dimension index optimization variable, it can be understood that the phrase description values of the topic interest phrase distribution1 can be optimized by using the first optimization variable and the third optimization variable, and a specific optimization idea may be the same as the above first optimization idea, which is not described herein too much.
It can be understood that, by performing topic interest phrase sorting on the topic interest phrase distribution4 and the topic interest phrase distribution5, for example, the variable data of the corresponding dimensions of the topic interest phrase distribution4 and the topic interest phrase distribution5 may be counted to determine the target topic interest phrase distribution.
Further, after the online topic big data is loaded, a part of the online topic big data is processed to obtain topic interest phrase distribution1, and then after the online topic big data is optimized by the first optimization variable and the second optimization variable, topic interest phrase distribution2 is obtained. And finally, performing lightweight updating operation based on the comparison result between the dimension indexes of the authenticated online topic big data and the topic interest phrase distribution2 and the first judgment index, and determining a topic interest phrase distribution 3.
Applying the above embodiment, the authenticated online topic big data may be converted into a quantized feature relationship network representation, and then the details of the topic interest phrase distribution3 may be determined through a two-class sliding process, resulting in a topic interest phrase distribution 4.
Further, in order to reduce the interference of the bipolar simplified processing on the accuracy and detail content of the extracted topic interest phrases as much as possible, the topic interest phrases of the authenticated online topic big data may be added to the topic interest phrase distribution4, for example, the topic interest phrase distribution5 and the topic interest phrase distribution4 that perform the second optimization on the topic interest phrase distribution1 are sorted.
In the above design thought, the dimension index of the topic interest phrase distribution1 is optimized, and then the topic interest phrase distribution1 with the optimized dimension index is upgraded in a light weight manner, so that the two-level light weight judgment index can be positioned flexibly, and the sampling quality difference between the light weight phrase sampling processing and the non-light weight phrase sampling processing is reduced.
In the embodiment of the present application, although the topic interest phrase distribution3 is a topic interest phrase quantized distribution, and the topic interest phrase distribution4 obtained by performing phrase sampling processing (binary sliding processing) on the topic interest phrase distribution3 is also a topic interest phrase quantized distribution, in view of that the topic interest phrase distribution5 is not a topic interest phrase quantized distribution, the target topic interest phrase distribution obtained by sorting the topic interest phrase distribution4 and the topic interest phrase distribution5 is also not a topic interest phrase quantized distribution, so that in the process of performing operations later, it is still necessary to perform lightweight upgrade through a trigger mechanism.
For STEP203, when the target topic interest phrase distribution is decomposed into a plurality of interactive topic interest phrase sets, the decomposition may be based on set dimensions.
In some examples, the topic interest phrase sorting node may include a plurality of lightweight topic interest phrase processing sub-nodes, the derived information of the U-th lightweight topic interest phrase processing sub-node is raw material information (raw material information is used as input information understanding) of the U + 1-th lightweight topic interest phrase processing sub-node, the raw material information of the first lightweight topic interest phrase processing sub-node is the target topic interest phrase distribution, the derived information of the last lightweight topic interest phrase processing sub-node is the topic interest phrase emotion field distribution, and U is a positive integer.
In some possible embodiments, for one of the lightweight topic interest phrase processing sub-nodes, topic interest phrase sorting and potential topic interest phrase mining can be performed through the following ideas, which may specifically include the contents recorded by the following STEPs 601 to 604.
STEP601, performing lightweight updating operation on the material type topic interest phrase distribution, and determining topic interest phrase distribution 6.
STEP602, performing phrase set mapping on the interactive topic interest phrase set of the topic interest phrase distribution6 based on at least one mapping indication, to obtain a mapped topic interest phrase distribution.
STEP603, performing phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6, and performing topic interest phrase sorting with the raw material type topic interest phrase distribution to obtain sorted topic interest phrase distribution.
STEP604, processing the sorted topic interest phrase distribution to obtain a topic interest phrase distribution generation result of the lightweight topic interest phrase processing sub-node.
In STEP601, performing lightweight updating operation on the raw material type topic interest phrase distribution can be understood as performing lightweight updating on the raw material type topic interest phrase distribution based on a trigger mechanism, and considering that raw material information of a first lightweight topic interest phrase processing sub-node is target topic interest phrase distribution, the target topic interest phrase distribution is not topic interest phrase quantitative distribution, so that model variables can be simplified through processing; the raw material information of the remaining light-weight topic interest phrase processing sub-node is derived information of a previous light-weight topic interest phrase processing sub-node, the derived information of the previous light-weight topic interest phrase processing sub-node can be processed through the steps, the distribution of the processed topic interest phrases is not necessarily the quantitative distribution of the topic interest phrases, and a light-weight updating operation is also needed based on the distribution.
In STEP602, the mapping indication may be a preset quantitative indication, for example, 1/2 indicating the scale of the distribution of topic interest phrases.
For one mapping indication, performing phrase set mapping on the interaction topic interest phrase set of the topic interest phrase distribution6 to obtain a mapped topic interest phrase distribution, which may be for one interaction topic interest phrase set, determining an interaction topic interest phrase set to be mapped in the topic interest phrase distribution6, where the interaction topic interest phrase set corresponds to the interaction topic interest phrase set, based on the mapping indication; and determining variable data of the interactive topic interest phrase set under each dimension after phrase set mapping is carried out on the interactive topic interest phrase set based on the variable data of the interactive topic interest phrase set to be mapped under the corresponding dimension.
In a possible embodiment, the mapping indication may be divided into an overall mapping process and a staged mapping process, where the overall mapping process may be understood as a global level or a wide-range mapping process, the staged mapping process may be understood as a local level or a small-range mapping process, and the mapping process may be understood as a swap process or a swap process.
For example, if the current interactive topic interest phrase set is an interactive topic interest phrase set _ a, the interactive topic interest phrase sets adjacent to the current interactive topic interest phrase set _ a are interactive topic interest phrase sets phrase set _ B, phrase set _ C, phrase set _ D and phrase set _ E; then, when performing the staged mapping processing on the interactive topic interest phrase set phrase _ a, variable data of the interactive topic interest phrase set _ B, phrase set _ C, phrase set _ D and phrase set _ E in corresponding dimensions can be determined, and variable data of the interactive topic interest phrase set phrase _ a in each dimension after the phrase set mapping is performed can be determined.
For example, when variable data of the interactive topic interest phrase sets to be mapped in corresponding dimensions are determined based on variable data of the interactive topic interest phrase sets to be mapped in corresponding dimensions, the corresponding dimensions of the interactive topic interest phrase sets to be mapped in different distribution states are different to a certain extent, for example, if the interactive topic interest phrase sets have H dimensions, indexes in the 0 th dimension to the 0.25 th dimension of the previous interactive topic interest phrase sets to be mapped in the current interactive topic interest phrase sets can be selected as indexes in the 0 th dimension to the 0.25 th dimension of the interactive topic interest phrase sets phrase set _ a after phrase set mapping; selecting indexes from the 0.25H dimension to the 0.5H dimension of a subsequent interactive topic interest phrase set to be mapped of a current interactive topic interest phrase set as indexes from the 0.25H dimension to the 0.5H dimension of the interactive topic interest phrase set _ A after phrase set mapping; selecting indexes in the dimension from 0.5H to 0.75H of the interactive topic interest phrase set to be mapped on the upper side of the current interactive topic interest phrase set as indexes in the dimension from 0.5H to 0.75H of the interactive topic interest phrase set phrase _ A after the phrase set mapping; and selecting indexes from the 0.75H dimension to the H dimension of the interactive topic interest phrase set to be mapped on the lower side of the current interactive topic interest phrase set as indexes from the 0.75H dimension to the H dimension of the interactive topic interest phrase set phrase _ A after phrase set mapping.
Therefore, the current interactive topic interest phrase set can be spliced into the overall topic interest phrases through the overall mapping, the current interactive topic interest phrase set can be spliced into the staged topic interest phrases through the staged mapping, based on the distribution of the topic interest phrases obtained by combining the ideas, the staged topic interest phrases and the overall topic interest phrases can be combined, the phrase sampling processing is carried out based on the windowed ideas, only the staged topic interest phrases can be combined, and therefore the integrity and the richness of the organized topic interest phrases can be guaranteed.
In the STEP603, the number of the mapped topic interest phrase distributions may be several, the mapped topic interest phrase distribution and the topic interest phrase distribution6 are respectively subjected to phrase sampling processing, and then are subjected to topic interest phrase sorting with the raw material type topic interest phrase distribution to obtain sorted topic interest phrase distributions, and the method may be that phrase sampling processing is performed on the mapped topic interest phrase distribution and the topic interest phrase distribution sorting 6 respectively based on a lightweight topic interest phrase processing thread to determine a plurality of potential topic interest phrase distributions, and then a dimension reduction operation is performed on the plurality of potential topic interest phrase distributions, and then topic interest phrase sorting is performed with the raw material type topic interest phrase distributions to obtain the sorted topic interest phrase distributions.
In the embodiment of the application, the topic interest phrase sorting performed with the raw material type topic interest phrase distribution has the beneficial effects of avoiding gradient loss and further ensuring the integrity of interest phrase analysis.
Or, calculating variable data on each dimension of the interactive topic interest phrase set to be mapped, and determining an obtained average calculation result as variable data on the dimension of the current interactive topic interest phrase set.
In STEP604, the processing of the sorted topic interest phrase distribution may be different from the lightweight update operation in STEP303 and STEP601, and may be processing based on a correlation activation function, for example. The determination indexes at the time of the weight reduction update operation in STEP303 and STEP601 may not coincide with each other, in other words, the first determination index and the second determination index may not coincide with each other.
It is to be understood that the performing of the dimensionless weight reduction operation on the plurality of potential topic interest phrase distributions may be performing the dimensionless weight reduction operation (such as normalization processing) after the sorting of the plurality of potential topic interest phrase distributions.
In the embodiment of the application, the sorted topic interest phrase distribution sorts more accurate and reliable periodical topic interest phrases and overall topic interest phrases, so that when the user interest knowledge distribution is determined based on the sorted topic interest phrase distribution, the accuracy and the reliability are higher.
For STEP104, after configuring the basic topic interest mining model by using the topic interest phrase emotion field distribution generated by the topic interest phrase organizing node and the priori basis of the authenticated online topic big data, a user interest knowledge distribution of the basic topic interest mining model may be determined by using the topic interest phrase emotion field distribution and a significance improving node (such as a supervising node), and then a model performance evaluation parameter (such as a cross model cost) may be determined by using the user interest knowledge distribution and the priori basis, and the basic topic interest mining model may be configured by using the model performance evaluation parameter.
In another possible embodiment, in order to improve the model accuracy of the basic topic interest mining model, the basic topic interest mining model may be adaptively configured.
For one possible embodiment, when the topic interest phrase sentiment field distribution generated by the topic interest phrase collating node and the prior basis of the authenticated online topic big data configure the basic topic interest mining model, the following STEPa and STEPb may be included.
And STEPa, obtaining a target migration learning model corresponding to the basic topic interest mining model.
In the embodiment of the application, the determined target transfer learning model may be configured, and the learning expectation of the target transfer learning model and the learning expectation of the basic topic interest mining model may be consistent. In view of the fact that the model variables of the target migration learning model are complete, the model accuracy of the target migration learning model is more accurate than that of the basic topic interest mining model, and the model accuracy of the basic topic interest mining model can be improved by adaptively configuring the basic topic interest mining model through the target migration learning model.
And the basic topic interest mining model is configured by using the STEPb, the topic interest phrase emotion field distribution generated by the topic interest phrase sorting node, the prior basis of the authenticated online topic big data and the target migration learning model.
In a possible embodiment, when configuring the basic topic interest mining model by using the topic interest phrase emotion field distribution generated by the topic interest phrase collating node, the priori basis of the authenticated online topic big data, and the target migration learning model, a first model performance evaluation parameter may be determined by using the topic interest phrase emotion field distribution and the first migration learning information of the target migration learning model on the authenticated online topic big data; and determining a second model performance evaluation parameter by using the topic interest phrase emotion field distribution and the prior basis of the authenticated online topic big data, and then configuring the basic topic interest mining model by using the first model performance evaluation parameter and the second model performance evaluation parameter.
In this embodiment of the application, the first model performance evaluation parameter is used to characterize an adaptive bias result of the target migration learning model during adaptive configuration, and the second model performance evaluation parameter is used to characterize a topic interest mining bias result of the basic topic interest mining model. And configuring the basic topic interest mining model by combining the first model performance evaluation parameter and the second model performance evaluation parameter, so that the model accuracy of the basic topic interest mining model can be improved.
It is to be understood that, when determining a first model performance evaluation parameter by using the topic interest phrase emotion field distribution and the first transfer learning information of the target transfer learning model on the authenticated online topic big data, second transfer learning information of the basic topic interest mining model may be determined based on a configuration sample balancing node and the topic interest phrase emotion field distribution; wherein the variable list of the configuration sample balance nodes is consistent with the variables of the target transfer learning model; and then determining the first model performance evaluation parameter by using the first transfer learning information and the second transfer learning information.
Further, the effect that the variable list (variable bit number or variable architecture) of the configuration sample balance node is consistent with the variable list of the target migration learning model can be: and ensuring that the integrity description of the target migration learning model is inherited through the configuration sample balance node.
In some possible embodiments, when determining the second model performance evaluation parameter by using the topic interest phrase emotion field distribution and the priori basis of the authenticated online topic big data, determining third transfer learning information of the basic topic interest mining model based on a significance improvement node and the topic interest phrase emotion field distribution; and then determining the performance evaluation parameters of the second model by using the third transfer learning information and the prior basis of the authenticated online topic big data.
For example, the first model performance evaluation parameter may be a relative model cost, a hinge model cost; the second model performance evaluation parameter may be a cross model cost or the like.
When the basic topic interest mining model is configured by using the first model performance evaluation parameter and the second model performance evaluation parameter, global processing (for example, weighted summation) may be performed on the first model performance evaluation parameter and the second model performance evaluation parameter, so as to determine a global model performance evaluation parameter, and then the basic topic interest mining model is configured based on the global model performance evaluation parameter.
In one possible embodiment, after the basic topic interest mining model is configured, a user requirement analysis node may be generated based on the configured configuration sample balancing node and the significance improvement node, which may be understood as a variable for generating the user requirement analysis node, so as to improve the operation quality of the basic topic interest mining model.
It is to be understood that the user requirement analysis node is configured to determine a user requirement analysis report based on the derived information of the topic interest phrase sorting node of the configured basic topic interest mining model when performing user requirement analysis.
By combining the ideas, the user requirement analysis nodes can be directly generated, so that unnecessary operations are reduced, and the analysis accuracy and the reliability of the user requirement analysis nodes can be guaranteed.
For an independently implementable embodiment, the configuration of the above underlying topic interest mining model can include the following.
After the authenticated online topic big data is collected, a part of the authenticated online topic big data is transmitted to a target migration learning model for adaptive configuration, and a part of the authenticated online topic big data is transmitted to a basic topic interest mining model for phrase sampling processing. After transmission specifically to the underlying topic interest mining model, the following recorded content of STEP1-STEP6 is implemented.
STEP1, transmitting the big data of the authenticated online topic to a light weight topic interest phrase extraction node to extract light weight topic interest phrases and obtain the distribution of the target topic interest phrases.
STEP2, and performing decomposition based on the target topic interest phrase distribution.
STEP3, transmitting the target topic interest phrase distribution to a topic interest phrase sorting node, and performing phrase sampling processing and topic interest phrase sorting.
In an embodiment of the present application, the topic interest phrase sorting node includes a plurality of lightweight topic interest phrase processing sub-nodes.
STEP4, transmitting part of the emotion category quantity generated by the topic interest phrase sorting node to a configuration sample balancing node, determining hinge model cost synchronously with the derived information of the target migration learning model, transmitting part of the emotion category quantity to a significance improving node, and determining cross model cost synchronously with the prior basis of the authenticated online topic big data.
STEP5, configuring the basic topic interest mining model based on the hinge model cost and the cross model cost.
STEP6, after the network configuration is completed, generating a user requirement analysis node based on the configuration sample balance node and the significance improvement node for further user requirement analysis.
The lightweight topic interest phrase distribution1 of the authenticated online topic big data extracted by the lightweight topic interest phrase extraction node can be subjected to dimension index optimization, and by the design, the lightweight upgrading is performed on the topic interest phrase distribution1 of which the dimension indexes are optimized, so that the lightweight upgrading can be understood as flexibly positioning two-level lightweight judgment indexes, and the sampling quality difference between lightweight phrase sampling processing and non-lightweight phrase sampling processing is reduced; further, the topic interest phrase sorting node can perform topic interest phrase sorting and potential topic interest phrase mining on a plurality of interactive topic interest phrase sets, so that the generated topic interest phrase emotion field distribution can fully consider topic interests in different stages, the mining accuracy and integrity of a topic interest mining model configured based on the topic interest phrase emotion field distribution and the prior basis can be improved, and the lightweight topic interest phrase extraction node and the topic interest phrase sorting node are subjected to light weight processing, so that the model architecture minimization of a specified topic interest mining model can be realized, the mining quality can be guaranteed, and the extra computing resource overhead can be reduced.
Under some independently implementable design ideas, after obtaining the user interest knowledge distribution of the online topic big data to be subjected to interest mining through the specified topic interest mining model, the method may further include the following contents described by STEP 103.
The STEP103 responds to a potential demand analysis instruction aiming at the user interest knowledge distribution, and performs potential demand analysis on the user interest knowledge distribution to obtain topic user demands; and recommending information based on the topic user requirement.
In the embodiment of the application, the potential demand analysis instruction can be sent to the big data mining system by the information recommendation platform, the big data mining system can further mine the user interest knowledge distribution based on the potential demand analysis instruction to obtain the topic user demand, then an information recommendation suggestion is generated according to the topic user demand and is issued to the information recommendation platform, and the information recommendation platform can perform targeted information recommendation based on the information recommendation suggestion to improve the information recommendation efficiency and avoid unnecessary resource waste.
Under some design ideas which can be independently implemented, the STEP103 analyzes potential requirements for the user interest knowledge distribution to obtain topic user requirements, and the following technical scheme can be implemented: analyzing the required items of the user interest knowledge distribution to obtain first user required data; knowledge description confusion is carried out on the interest knowledge distribution template of the user interest knowledge distribution and the user interest knowledge distribution, and required item analysis is carried out on the user interest knowledge distribution which is subjected to knowledge description confusion to obtain second user required data; and determining a demand analysis result of the active demand items in the user interest knowledge distribution according to the first user demand data and the second user demand data.
It can be understood that through knowledge description obfuscation processing (feature obfuscation processing), active attenuation processing of noise requirement items can be achieved, thereby ensuring accuracy and reliability of requirement analysis results.
Under some design ideas which can be independently implemented, the first user requirement data corresponds to a first interest knowledge region in the user interest knowledge distribution, and the second user requirement data corresponds to a second interest knowledge region in the user interest knowledge distribution; the determining a demand analysis result of an active demand item in the user interest knowledge distribution according to the first user demand data and the second user demand data includes: determining a region commonality value for the first knowledge region of interest and the second knowledge region of interest; and in response to the fact that the area commonality value is larger than a set threshold value, integrating the first user demand data and the second user demand data, and determining a demand analysis result of active demand items in the user interest knowledge distribution.
In some independently implementable design considerations, the first user demand data includes a first tag of the active demand item, and the second user demand data includes a second tag of the active demand item; after the first user requirement data and the second user requirement data are corrected, determining a requirement analysis result of active requirement items in the user interest knowledge distribution includes: and combining the first label of the active demand item with the second label of the active demand item, and determining a final positioning label of the active demand item.
In some independently implementable design considerations, the first user demand data includes a first demand topic for the active demand item and a first likelihood that the active demand item corresponds to the first demand topic, and the second user demand data includes a second demand topic for the active demand item and a second likelihood that the active demand item corresponds to the second demand topic; the integrating the first user demand data and the second user demand data and determining a demand analysis result of active demand items in the user interest knowledge distribution comprise: in response to the first demand topic and the second demand topic being the same topic, adding the first likelihood and the second likelihood results in a likelihood that the active demand item corresponds to the demand topic.
Under some design ideas which can be independently implemented, the merging the first user requirement data and the second user requirement data, and determining a requirement analysis result of an active requirement item in the user interest knowledge distribution further include: updating the first likelihood and the second likelihood in response to the first demand topic and the second demand topic being different demand topics.
Under some design ideas which can be independently implemented, performing requirement item analysis on user interest knowledge distribution to obtain first user requirement data, including: carrying out requirement item analysis on the user interest knowledge distribution by utilizing a first requirement item analysis thread to obtain a first requirement field distribution of the user interest knowledge distribution; carrying out requirement item analysis on the first requirement field distribution of the user interest knowledge distribution by using a second requirement item analysis thread to obtain a plurality of groups of second requirement field distributions of the user interest knowledge distribution; and analyzing the distribution of the plurality of groups of second requirement fields of the user interest knowledge distribution to obtain the first user requirement data.
Under some design ideas which can be independently implemented, the step of carrying out knowledge description confusion on the interest knowledge distribution template of the user interest knowledge distribution and the user interest knowledge distribution, and carrying out requirement item analysis on the user interest knowledge distribution which is subjected to knowledge description confusion to obtain second user requirement data comprises the following steps: integrating the first demand field distribution of the user interest knowledge distribution with the first demand field distribution of the interest knowledge distribution template by using a knowledge description confusion thread to obtain a third demand field distribution of the user interest knowledge distribution; carrying out requirement item analysis on the third requirement field distribution of the user interest knowledge distribution by using a third requirement item analysis thread to obtain a plurality of groups of fourth requirement field distributions of the user interest knowledge distribution; and analyzing the distribution of a plurality of groups of fourth demand fields of the user interest knowledge distribution to obtain the second user demand data.
Under some independently implementable design ideas, the interest knowledge distribution template includes a plurality of groups of interest knowledge distributions of a preset number of groups before the user interest knowledge distribution; the integrating the first demand field distribution of the user interest knowledge distribution and the first demand field distribution of the interest knowledge distribution template by using the knowledge description confusion thread to obtain a third demand field distribution of the user interest knowledge distribution comprises: and integrating the first demand field distribution of the user interest knowledge distribution with a third demand field distribution of a previous group of interest knowledge distributions of the user interest knowledge distribution to obtain a third demand field distribution of the user interest knowledge distribution.
Fig. 3 is an architecture diagram illustrating an application environment of a big data mining method for online topics, in which an embodiment of the present application may be implemented, and the big data mining system 100 and the topic activity platform system 200, which may communicate with each other, may be included in the application environment of the big data mining method for online topics. Based on this, the big data mining system 100 and the topic activity platform system 200 implement or partially implement the big data mining method for online topics of the embodiment of the present application when running.
The embodiments of the present application have been described above with reference to the accompanying drawings, and have at least the following beneficial effects: in view of the fact that the specified topic interest mining model is configured based on the lightweight configuration rule, the specified topic interest mining model is used for mining large online topic data to be subjected to interest mining, on one hand, user interest knowledge distribution can be obtained rapidly, timeliness of user interest mining is improved, on the other hand, accuracy and integrity of user interest knowledge distribution can be improved, and quality of user interest mining is improved. In conclusion, by means of the specified topic interest mining model and the lightweight configuration, the efficiency and the quality of the user interest knowledge distribution can be improved.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (10)

1. A big data mining method for online topics is applied to a big data mining system connected with a topic activity platform system in communication, and the method at least comprises the following steps:
when a user interest mining request sent by the topic activity platform system is received, calling online topic big data to be subjected to interest mining from a set relational database corresponding to the topic activity platform system by using the user interest mining request;
transmitting the online topic big data to be subjected to interest mining to a specified topic interest mining model, and obtaining user interest knowledge distribution of the online topic big data to be subjected to interest mining through the specified topic interest mining model; the specified topic interest mining model is configured based on a lightweight configuration rule.
2. The method of claim 1, wherein the topic-specific interest mining model is configured as follows:
collecting big data of the authenticated online topics and determining the prior basis of the big data of the authenticated online topics;
transmitting the authenticated online topic big data to a lightweight topic interest phrase extraction node of a basic topic interest mining model, and determining target topic interest phrase distribution corresponding to the authenticated online topic big data; the lightweight topic interest phrase extraction node comprises a dimension index optimization variable to be configured;
distributing and disassembling the target topic interest phrase into a plurality of interactive topic interest phrase sets according to a set scale, and distributing and transmitting the target topic interest phrase to a topic interest phrase sorting node, wherein the topic interest phrase sorting node comprises a plurality of light-weight topic interest phrase processing sub-nodes, and each light-weight topic interest phrase processing sub-node is used for carrying out topic interest phrase sorting and potential topic interest phrase mining on the interactive topic interest phrase sets;
and configuring the basic topic interest mining model by utilizing the topic interest phrase emotion field distribution generated by the topic interest phrase sorting node and the prior basis of the authenticated online topic big data.
3. The method of claim 2, wherein the transmitting the authenticated online topic big data to a lightweight topic interest phrase extraction node of an underlying topic interest mining model to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data comprises:
performing topic interest phrase summarization operation on the authenticated online topic big data, and determining topic interest phrase distribution1 corresponding to the authenticated online topic big data;
performing first optimization on the dimension indexes of the topic interest phrase distribution1 by using the dimension index optimization variables, and determining topic interest phrase distribution2 which completes optimization;
performing lightweight updating operation by using the authenticated online topic big data and the topic interest phrase distribution2, and determining a topic interest phrase distribution 3;
and performing phrase sampling processing on the topic interest phrase distribution3 to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data.
4. The method of claim 3, wherein the determining a topic interest phrase distribution3 by performing a lightweight update operation using the authenticated online topic big data and the topic interest phrase distribution2 comprises:
determining the topic interest phrase distribution3 based on a first determination index of a specified trigger mechanism and a quantitative comparison result of the authenticated online topic big data and a corresponding dimension index of the topic interest phrase distribution 2.
5. The method according to claim 4, wherein the performing phrase sampling processing on the topic interest phrase distribution3 to determine a target topic interest phrase distribution corresponding to the authenticated online topic big data comprises:
performing phrase sampling processing on the topic interest phrase distribution3 to determine a topic interest phrase distribution 4;
performing second optimization on the phrase description values of the topic interest phrase distribution1 by using the dimension index optimization variables, and determining topic interest phrase distribution5 which completes optimization;
and performing topic interest phrase sorting on the topic interest phrase distribution4 and the topic interest phrase distribution5 to determine the target topic interest phrase distribution.
6. The method according to claim 2, wherein the derived information of the U-th lightweight topic interest phrase processing sub-node in the topic interest phrase preparation node is raw material information of the U + 1-th lightweight topic interest phrase processing sub-node, the raw material information of the first lightweight topic interest phrase processing sub-node is the target topic interest phrase distribution, the derived information of the last lightweight topic interest phrase processing sub-node is the topic interest phrase emotion field distribution, and U is a positive integer.
7. The method of claim 2, wherein for one of the lightweight topic interest phrase processing sub-nodes, the lightweight topic interest phrase processing sub-node is configured to perform topic interest phrase sorting and potential topic interest phrase mining on the interactive topic interest phrase sets distributed by raw material type topic interest phrases transmitted to the lightweight topic interest phrase processing sub-node based on:
performing lightweight updating operation on the raw material type topic interest phrase distribution, and determining topic interest phrase distribution 6;
performing phrase set mapping on the interactive topic interest phrase set of the topic interest phrase distribution6 based on at least one mapping indication to obtain mapped topic interest phrase distribution;
after phrase sampling processing is respectively carried out on the mapped topic interest phrase distribution and the topic interest phrase distribution6, topic interest phrase arrangement is carried out on the mapped topic interest phrase distribution and the raw material type topic interest phrase distribution to obtain arranged topic interest phrase distribution;
processing the sorted topic interest phrase distribution to obtain a topic interest phrase distribution generation result of the light weight topic interest phrase processing sub-node;
wherein, for one of the mapping indications, the phrase set mapping the interactive topic interest phrase set of the topic interest phrase distribution6 to obtain a mapped topic interest phrase distribution includes: for one interactive topic interest phrase set, determining an interactive topic interest phrase set to be mapped, corresponding to the interactive topic interest phrase set, in the topic interest phrase distribution6 based on the mapping indication; determining variable data of the interactive topic interest phrase set under each dimension after phrase set mapping is carried out on the interactive topic interest phrase set based on variable data of the interactive topic interest phrase set to be mapped under the corresponding dimension;
after performing phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6, performing topic interest phrase sorting with the raw material type topic interest phrase distribution to obtain sorted topic interest phrase distribution, including: performing phrase sampling processing on the mapped topic interest phrase distribution and the topic interest phrase distribution6 based on a lightweight topic interest phrase processing thread, respectively, and determining a plurality of potential topic interest phrase distributions; and after the dimension reduction and light weight operation are carried out on the plurality of potential topic interest phrase distributions, carrying out topic interest phrase arrangement on the potential topic interest phrase distributions and the raw material type topic interest phrase distributions to obtain the arranged topic interest phrase distributions.
8. The method of claim 2, wherein the configuring the base topic interest mining model using the topic interest phrase sentiment field distributions generated by the topic interest phrasing node and the prior basis of the authenticated online topic big data comprises:
obtaining a target migration learning model corresponding to the basic topic interest mining model;
and configuring the basic topic interest mining model by utilizing topic interest phrase emotion field distribution generated by the topic interest phrase sorting node, the prior basis of the authenticated online topic big data and the target migration learning model.
9. The method of claim 8, wherein the configuring the base topic interest mining model using the topic interest phrase sentiment field distributions generated by the topic interest phrasing node, the prior basis of the authenticated online topic big data, and the target transfer learning model comprises:
determining a first model performance evaluation parameter by using the topic interest phrase emotion field distribution and first transfer learning information of the target transfer learning model on the authenticated online topic big data;
determining a second model performance evaluation parameter by using the topic interest phrase emotion field distribution and the prior basis of the authenticated online topic big data;
configuring the basic topic interest mining model by using the first model performance evaluation parameter and the second model performance evaluation parameter;
the determining a first model performance evaluation parameter by using the topic interest phrase emotion field distribution and the first transfer learning information of the target transfer learning model on the authenticated online topic big data comprises the following steps: determining second transfer learning information of the basic topic interest mining model based on configuration sample balancing nodes and the topic interest phrase emotion field distribution; wherein the variable list of the configuration sample balance nodes is consistent with the variables of the target migration learning model; determining the first model performance evaluation parameter by using the first transfer learning information and the second transfer learning information;
wherein, the determining a second model performance evaluation parameter by using the emotion field distribution of the topic interest phrases and the prior basis of the big data of the authenticated online topics comprises: determining third transfer learning information of the basic topic interest mining model based on a significance improvement node and the topic interest phrase emotion field distribution; determining a performance evaluation parameter of the second model by using the third transfer learning information and the prior basis of the authenticated online topic big data;
wherein after the configuration of the underlying topic interest mining model is completed, the method further comprises: generating a user demand analysis node based on the configured sample balancing node and the significance improving node which are configured, wherein the user demand analysis node is used for determining a user demand analysis report based on the derived information of the topic interest phrase sorting node of the basic topic interest mining model which is configured when user demand analysis is carried out.
10. A big data mining system, comprising:
a memory for storing an executable computer program, a processor for implementing the method of any one of claims 1 to 9 when executing the executable computer program stored in the memory.
CN202210371213.XA 2022-04-11 2022-04-11 Big data mining method and big data mining system for online topics Withdrawn CN114780695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210371213.XA CN114780695A (en) 2022-04-11 2022-04-11 Big data mining method and big data mining system for online topics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210371213.XA CN114780695A (en) 2022-04-11 2022-04-11 Big data mining method and big data mining system for online topics

Publications (1)

Publication Number Publication Date
CN114780695A true CN114780695A (en) 2022-07-22

Family

ID=82428917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210371213.XA Withdrawn CN114780695A (en) 2022-04-11 2022-04-11 Big data mining method and big data mining system for online topics

Country Status (1)

Country Link
CN (1) CN114780695A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115688742A (en) * 2022-12-08 2023-02-03 宋杨 User data analysis method and AI system based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115688742A (en) * 2022-12-08 2023-02-03 宋杨 User data analysis method and AI system based on artificial intelligence
CN115688742B (en) * 2022-12-08 2023-10-31 北京国联视讯信息技术股份有限公司 User data analysis method and AI system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN116628172B (en) Dialogue method for multi-strategy fusion in government service field based on knowledge graph
CN110750649A (en) Knowledge graph construction and intelligent response method, device, equipment and storage medium
CN109299245B (en) Method and device for recalling knowledge points
CN116089873A (en) Model training method, data classification and classification method, device, equipment and medium
Xu et al. A fireworks algorithm based on transfer spark for evolutionary multitasking
CN113095511A (en) Method and device for judging in-place operation of automatic master station
CN114780695A (en) Big data mining method and big data mining system for online topics
CN110188207B (en) Knowledge graph construction method and device, readable storage medium and electronic equipment
CN117688165B (en) Multi-edge collaborative customer service method, device, equipment and readable storage medium
CN114662470A (en) Product comment information processing method and system combining big data
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
Krasnyanskiy et al. The algorithm of document routing in the electronic document management system using machine learning methods
Vyas et al. Predictive analytics for E learning system
CN113570222A (en) User equipment identification method and device and computer equipment
CN109558887A (en) A kind of method and apparatus of predictive behavior
CN111368060A (en) Self-learning method, device and system for conversation robot, electronic equipment and medium
CN112149623B (en) Self-adaptive multi-sensor information fusion system, method and storage medium
CN116842936A (en) Keyword recognition method, keyword recognition device, electronic equipment and computer readable storage medium
Montañés et al. A wrapper approach with support vector machines for text categorization
CN111897932A (en) Query processing method and system for text big data
CN110968690A (en) Clustering division method and device for words, equipment and storage medium
CN116127067B (en) Text classification method, apparatus, electronic device and storage medium
CN116431779B (en) FAQ question-answering matching method and device in legal field, storage medium and electronic device
Pang Application of Decision Tree ID3 Algorithm in Tax Policy Document Recognition
Begen Artificial Intelligence in the “Our St. Petersburg” e-Participation Portal Functioning: Outcomes of Intellectual Classifier Development

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220722