CN113609117A - Data denoising method based on big data and cloud computing and cloud server - Google Patents

Data denoising method based on big data and cloud computing and cloud server Download PDF

Info

Publication number
CN113609117A
CN113609117A CN202110905009.7A CN202110905009A CN113609117A CN 113609117 A CN113609117 A CN 113609117A CN 202110905009 A CN202110905009 A CN 202110905009A CN 113609117 A CN113609117 A CN 113609117A
Authority
CN
China
Prior art keywords
thread
data
content
configuration
file set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110905009.7A
Other languages
Chinese (zh)
Inventor
高云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110905009.7A priority Critical patent/CN113609117A/en
Publication of CN113609117A publication Critical patent/CN113609117A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

According to the data denoising method and the cloud server based on big data and cloud computing, balance sample configuration file configuration refers to that a group of data is obtained by a data processing thread to test a performance detection thread, and then the performance detection thread tests the group of data and a detection result of corresponding complete data (namely service item content without denoising). It can be understood that, in the embodiment of the present application, the purpose of balancing the sample profile configuration is to enable the intermediate samples obtained by the data processing threads to achieve the effect of optimizing the update of the threads of the countermeasure nature. In other words, it is difficult for the performance detection thread to distinguish whether the intermediate sample is the service item content after denoising or the service item content before denoising.

Description

Data denoising method based on big data and cloud computing and cloud server
The application is a divisional application with the application number of 202110165552.8, the application date of 2021, 02/06, and the application name of a big data processing method and a cloud computing server in an online cloud service environment.
Technical Field
The application relates to the technical field of big data and cloud service, in particular to a data denoising method based on big data and cloud computing and a cloud server.
Background
Big data (big data) and cloud computing (cloud computing) are two most remarkable symbolic technologies in the digital economic era, and the big data technology and the cloud computing technology supplement each other and make a great contribution to the development of the modern society.
At present, depending on cloud computing, many business services can be processed at the cloud, so that the business handling efficiency is improved, and the business handling cost is reduced. And the big data mining can continuously optimize and update various online services so as to meet various service requirements brought by the rapid development of society. Big data mining (big data mining) refers to a process of searching information hidden in a large amount of data through an algorithm, is generally related to computer science, and achieves the aim through various methods such as statistics, online analysis processing, intelligence retrieval, machine learning, expert systems (depending on past experience rules), pattern recognition and the like.
The big data mining can be applied to the business fields of user portrait analysis, equipment state analysis and the like, corresponding big data need to be obtained before relevant big data mining business is carried out, but most of the existing big data carry noise data, and data denoising is needed to ensure the accuracy of big data mining. However, the data content obtained based on the related data denoising technique still has some defects in subsequent use.
Disclosure of Invention
One of the embodiments of the present application provides a data denoising method based on big data and cloud computing, which is applied to a cloud server, and the method includes: acquiring the content of a target service item carrying noise;
determining service item indication information corresponding to the target service item content based on a pre-trained noise filtering thread; and obtaining marked project content production data corresponding to the target service project content according to the service project indication information, and realizing denoising processing on the target service project content by combining the noise filtering thread.
One of the embodiments of the present application provides a cloud server, including a processing engine, a network module, and a memory; the processing engine and the memory communicate through the network module, and the processing engine reads the computer program from the memory and operates to perform the above-described method.
According to the data denoising method based on the big data and the cloud computing and the cloud server, the target service item content carrying noise is obtained, the target service item content is input to the noise filtering thread which is obtained in advance, and the corresponding denoising service item content is obtained according to the output of the noise filtering thread. The noise filtering thread comprises a plurality of mutually associated service environment detection modules, item content production data of each content block category of target service item content is obtained through a data classification strategy of the service environment detection modules, the item content production data is used as input of an information identification strategy in the service environment detection modules, document digital information of each content block category is obtained through the information identification strategy, the document digital information is identified to obtain service item indication information of each content block category, and the item content production data of each content block category is respectively marked according to the service item indication information to obtain marked item content production data. It can be understood that the noise filtering thread can maintain better noise data identification and filtering performance, fully identify service requirements of different content block categories to obtain corresponding service item indication information, process content blocks with higher heat degree through marking, weaken cold content blocks or wrong content blocks, further effectively filter noise content in target service item content, simultaneously ensure that denoising service item content can reversely deduce important content blocks of the target service item content, and improve content information reduction degree and service environment matching degree of the denoising service item content.
Drawings
FIG. 1 is a block diagram of a data denoising system based on big data and cloud computing.
FIG. 2 is a flow diagram of a method and/or process for data denoising based on big data and cloud computing.
Fig. 3 is a schematic diagram of steps of configuring a noise filtering thread in a data denoising method based on big data and cloud computing.
FIG. 4 is a block diagram of an exemplary big data and cloud computing based data denoising apparatus, according to some embodiments of the present invention, an
Fig. 5 is a schematic diagram illustrating hardware and software components in an exemplary cloud server, according to some embodiments of the invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.
The embodiment aims to perform noise cleaning on service item content with noise content/noise data, ensure that the service item content after cleaning can highly restore data information as much as possible, and conform to the subsequent data information application environment as much as possible so as to improve the matching degree with the service environment.
The technical solution described in this embodiment may include the following:
step A1, acquiring the content of the target service item with noise;
step B1, processing the target service item content according to the noise filtering thread configured based on the balanced sample configuration file to obtain the marked item content production data;
and step C1, obtaining the de-noised service item content corresponding to the target service item content through the noise filtering thread and the marked item content production data.
It can be understood that the noise filtering thread in the above scheme may be a neural network model, that is, the scheme may remove data noise by using a relevant machine learning algorithm, and of course, the content of the service item may also be understood as relevant business data.
Furthermore, because the noise filtering thread is configured, trained and optimized based on the balanced sample configuration file, when the noise filtering thread is used, the noise filtering thread can be ensured to have better noise data identification and filtering performance, and the noise filtering thread is ensured to fully consider the business requirements corresponding to the actual cloud service environment when processing the target service item content, so that different noise sources can be accurately and reliably distinguished when the noise of the target service item content is removed, corresponding de-noising processing is performed according to different noise sources, that is, the heat information of a content block corresponding to the target service item content is taken into account through the item content production data, so that the removed content information can be ensured not to influence the global information expression capability of the target service item content as much as possible, therefore, the important content blocks of the content of the target service item can be reversely deduced by the content of the denoising service item, and the content information reduction degree and the service environment matching degree of the content of the denoising service item are improved.
In addition, the following summary can be made with respect to the present solution:
step A2, obtaining the content of the target service item;
step B2, determining marked item content production data corresponding to the target service item content according to a pre-configured noise filtering thread;
and step C2, utilizing the noise filtering thread and the marked project content production data to remove noise of the target service project content to obtain noise-removed service project content.
Further, with regard to the present solution, the following can also be summarized:
a3, acquiring the target service item content carrying noise;
b3, determining service item indication information corresponding to the target service item content based on a pre-trained noise filtering thread;
and C3, obtaining marked project content production data corresponding to the target service project content according to the service project indication information, and implementing denoising processing on the target service project content by combining the noise filtering thread.
It should be understood that the technical schemes described in the steps a1-C1, the technical schemes described in the steps a2-C2 and the technical schemes described in the steps A3-C3 can be further explained by the following steps, and of course, the technical schemes can be combined to obtain new technical schemes based on some combined examples, which are not limited herein. The technical solution of the present embodiment will be further explained with reference to the accompanying drawings.
As shown in fig. 1, in one embodiment, a communication architecture diagram of a big data and cloud computing based data denoising system 100 is provided, which may include a data acquisition device 110 and a cloud server 120 in communication with each other. The data acquisition device 110 may be an intelligent electronic device, including but not limited to a mobile phone, a tablet computer, a notebook computer, a laptop computer, an intelligent wearable device, and the like. The data acquisition device 110 may have a plurality of data acquisition devices, such as a data acquisition device 110a, a data acquisition device 110b, a data acquisition device 110c, a data acquisition device 110d, and the like, the data acquisition device 110a, the data acquisition device 110b, the data acquisition device 110c, and the data acquisition device 110d may form a data acquisition device cluster, and the data acquisition devices in the figure may be different types of data acquisition devices, which is not limited herein. It can be understood that the cloud server 120 may interface multiple data acquisition devices, that is, the cloud server 120 and the data acquisition devices have a one-to-many correspondence, and the cloud server 120 may communicate with the multiple data acquisition devices synchronously or asynchronously, so as to implement the method provided in this embodiment.
For example, with the system side as the execution subject, the technical content included in the scheme may be as follows:
the data acquisition equipment acquires contents carrying noise target service items corresponding to user operation behaviors based on the user operation behaviors;
the cloud server acquires the content of the target service item carrying noise; processing the target service item content according to a noise filtering thread configured based on a balanced sample configuration file to obtain marked item content production data; and obtaining the de-noising service item content corresponding to the target service item content through the noise filtering thread and the marked item content production data.
It is understood that further description of the embodiments of the system may refer to the following, which is not repeated herein.
As shown in fig. 2, in one embodiment, a data denoising method based on big data and cloud computing is provided. The embodiment mainly illustrates that the method is applied to the cloud server 120 in fig. 1. Referring to fig. 2, the data denoising method based on big data and cloud computing specifically includes the following steps S21-S26.
And S21, acquiring the target service item content carrying noise.
The target service item content is service item content which carries noise data/information and needs noise filtering processing. The target service item content may specifically be a noisy interactive service item content, or may be a portrait label service item content determined from a noisy interactive service item content. For example, when the service item content processing program is used to perform noise filtering processing, the acquired target service item content with noise is the interactive service item content input to the service item content processing program or the determined portrait label service item content. The interactive service item content can represent the service data interaction condition among multiple ends, and the portrait label service item content can represent the relevant data information aiming at the personalized analysis of the user.
For example, the service item content may include a series of service data, which may be used as an input of the data mining algorithm at a later stage, but in order to ensure accuracy and reliability of a mining result of the data mining algorithm, corresponding noise processing needs to be performed, and therefore, the present scheme may also be regarded as a preamble step of the data mining service.
S22, inputting the target service item content into a noise filtering thread configured based on the balanced sample configuration file; the noise filtering thread comprises a plurality of mutually associated service environment detection modules.
It should be understood that the thread in the present embodiment may be a corresponding algorithm model or an artificial intelligence network, in this embodiment, the noise filtering thread is configured in advance based on the balanced sample configuration file, and the noise filtering thread may be a thread that performs noise filtering on interactive service item content of document digitization, or a thread that filters noise on local portrait label service item content. It can be understood that when the noise filtering thread is a thread for filtering noise for the content of the digitized interactive service item of the document, the content of the target service item is the content of the digitized interactive service item of the document; when the noise filtering thread is a thread for filtering noise aiming at the local portrait label service item content, the target service item content is the local portrait label service item content. Further, the service environment detection module may be understood as each functional unit in the thread, for example, when the noise filtering thread is a neural network model, the service environment detection module may be a related functional network layer.
The balance sample configuration file comprises a positive sample set and a negative sample set, wherein the positive sample set refers to the content of the service items without noise, and the negative sample set refers to the content of the service items with noise. Further, the balanced sample configuration file may correspond to a data processing thread and a performance detection thread, the data processing thread and the performance detection thread may correspond to a noise filtering thread, the data processing thread and the performance detection thread may be affiliated with the noise filtering thread, that is, may be a part of the noise filtering thread, and the data processing thread and the performance detection thread may also be in an equal business logic relationship with the noise filtering thread, which is not limited herein.
The data processing thread is used for obtaining a set of cleaned data (or service item content) which is as complete as possible according to relevant input, and the performance detection thread is used for judging whether the input set of data belongs to the complete data or the cleaned data (the complete data can be understood as data which is not subjected to data cleaning or noise removal processing). The configuration of the balanced sample configuration file refers to a process that a group of data obtained by a data processing thread is used for testing a performance detection thread, then the performance detection thread is used for judging the group of data and a detection result of corresponding complete data, and in the configuration process of the two threads, the capabilities of the two threads are enhanced more and more, and stable convergence is finally achieved. The service environment detection module is a functional structure formed by a data classification strategy, an information identification strategy and a data marking strategy of an artificial intelligent neural network. The information identification strategy specifically comprises a digitization execution unit and an identification execution unit, wherein the digitization execution unit is used for processing project content production data of each content block type to obtain document digitization information of each content block type, and the identification execution unit is used for identifying the document digitization information to obtain service project indication information of each content block type. In some possible embodiments, the data classification policy, the information identification policy, and the data tagging policy may be regarded as respective functional layers of an artificial intelligence neural network, and different functions are correspondingly implemented, for example, the data classification policy may be understood as a data classification layer, the information identification policy may be understood as an information identification layer, and the data tagging policy may be understood as a data tagging layer.
With continued reference to fig. 2, the business environment detection module may be configured to perform the following steps:
s23, obtaining the item content production data of each content block type of the target service item content through the data classification strategy of the service environment detection module, wherein the item content production data is used as the input of the information identification strategy in the service environment detection module.
In some possible implementation manners, the data classification policy of the service environment detection module is used for performing block identification processing on the input target service item content to obtain item content production data of each content block type of the target service item content, and the item content production data is used as the input of the information identification policy in the service environment detection module.
It can be understood that, by performing the block identification processing on the target service item content, the overall and complex target service item content can be scattered according to different content block types, and meanwhile, the relevance between the item content production data of each content block type can be ensured not to be damaged, so that when the item content production data is used as the input of the information identification strategy in the business environment detection module, the processing pressure of the information identification strategy on the target service item content can be effectively reduced, and the identification efficiency and the reliability on the different item content production data can be improved.
And S24, obtaining the document digitalization information of each content block type through the information identification strategy, and identifying the document digitalization information to obtain the service item indication information of each content block type.
The document digital information is a numerical distribution expression form of the item content production data of each content block category. It can be understood that before the cloud server processes the item content production data of each content block category, the data format of the item content production data of each content block category may be diversified, such as a text format, a voice format, an image video format, and the like, and if the item content production data is directly processed, certain confusion is inevitable, and the default data processing mode of the cloud server is also not met, and therefore, document digitization processing needs to be performed on the item content production data of each content block category. Through document digital processing, data in different forms can be converted into a uniform form, so that subsequent global and local analysis is facilitated.
Further, the service item indication information may be guiding information for different content chunk categories, such as indicating which traffic scenarios or service environments the corresponding content chunk categories should be applied. For example, if the content block category is "user group age", then the "numerical calculation result is non-negative" corresponding to the service item indication information may be used to characterize that the content block category should be applied in a data service scenario without a negative value. Of course, the present embodiment will not be further described herein with respect to other examples of the service item indication information.
And S25, marking the item content production data of each content block type according to the service item indication information through the data marking strategy of the service environment detection module to obtain marked item content production data.
In some possible implementation manners, through a data tagging policy of the service environment detection module, the item content production data of each content block category is tagged with the corresponding indication tag of the service item indication information, so that the tagged item content production data can be obtained.
Based on the data marking operation, the marked project content production data can be subjected to subsequent processing. Since the marked item content production data is obtained as the service item indication information according to each content block category, it is possible to weaken cold content blocks or erroneous content blocks while ensuring content blocks having higher heat. Content blocks with higher heat may be understood as more frequently used content blocks, and cold content blocks or erroneous content blocks may be understood as less frequently used content blocks, which may have a higher probability of noise.
S26, the noise-removed service item content corresponding to the target service item content is obtained through the noise filtering thread and the marked item content production data.
The noise filtering thread is a configured thread, has a noise filtering effect, and obtains the de-noising service item content corresponding to the target service item content after being processed by a plurality of service environment detection modules and other functional modules in the noise filtering thread. Therefore, the noise-removing service item content can be ensured to exert the maximum efficacy in the later data mining process as far as possible.
Further, in an actual implementation process, in order to ensure that the influence of the denoising process on the original content information is weakened as much as possible when denoising the target service item content, that is, to avoid the missing of useful content information in the target service item content as much as possible, in one possible implementation, the denoising service item content corresponding to the target service item content is obtained through the noise filtering thread and the marked item content production data described in step S26, which may be implemented by the method described in the following content: acquiring a production environment label corresponding to the marked project content production data; analyzing the production environment tags to obtain business service requirement information corresponding to each production environment tag; performing feature extraction on each group of service requirement information through the associated running thread corresponding to the noise filtering thread to obtain service requirement features corresponding to each group of service requirement information; identifying the content of the target service item according to the service demand characteristics to obtain a content identification result corresponding to the content of the target service item; and screening the content of the target service item according to the content distribution difference information between the content identification result and the content of the target service item to obtain the content of the denoising service item.
For example, production environment tags are used to characterize the production status of different project content production content. For example, the production environment tag may be an online office tag, a remote education tag, a cross-border payment tag, a smart medical tag, a smart factory tag, a government and enterprise service tag, and the like, which are not limited herein. Further, different production environment tags represent different business service requirements, and therefore, by analyzing the production environment tags, corresponding business service requirement information can be obtained, that is, the subsequent big data mining focuses on understanding which layers of information are present, for example, taking online office tags as an example, the business service requirement information corresponding to the online office tags may be "wish to perform segmented character replacement", "wish to perform automatic tagging of spelling errors", or "wish to perform quick meeting start", and the like. On this basis, the associated running thread may be a feature extraction thread corresponding to the noise filtering thread, such as a Convolutional Neural Network (CNN). By obtaining the service demand characteristics, the refined identification of the target service item content can be realized, so that some flow redundant data are avoided being considered, and in this way, by analyzing the content distribution difference information between the content identification result and the target service item content, the accurate screening of the target service item content can be realized, so that the influence of the denoising process on the original content information is weakened as much as possible when the target service item content is denoised, and the loss of useful content information in the target service item content is avoided as much as possible.
In some embodiments, the content distribution difference information may be expressed in a form of a graph, which is not limited herein. The content distribution variance information may include a summary of the content identification results and variance information of the target service item content at different time periods and/or different data categories.
On the basis of the above content, in order to accurately obtain a content identification result, the step "identify the content of the target service item according to the service requirement characteristics to obtain a content identification result corresponding to the content of the target service item" may also be implemented in the following manner.
Firstly, target service interaction data to be identified are obtained from the target service item content according to the characteristic category information corresponding to the service requirement characteristics. For example, the target service interaction data is interaction data except for procedural redundancy data in the content of the target service item, and the target service interaction data carries valuable related information.
And secondly, determining the target service state switching frequency corresponding to the interactive data track of the target service interactive data. For example, the interactive data track is a graph data track in which the service state switching frequency of each interactive data segment is less than the first global service state switching frequency of the target service interactive data, the target service state switching frequency is a service state switching frequency corresponding to the number of target interactive data segments, and the number of target interactive data segments is the maximum number of interactive data segments in the interactive data track in the number of interactive data segments corresponding to each service state switching frequency. Graph Data (graphical Data) is a representation in the form of a graph object, which can visually reflect the change (time-series change or content change) between different interaction states. The interactive data segments may be obtained by splitting according to the data identifier of the target service interactive data, and the service state switching frequency may be understood as the switching speed between different service states, for example, for some target service interactive data, multiple different service states may be corresponded, and the switching speed between these service states may be represented by the service state switching frequency. For example, the traffic state switching frequency may be times/s or times/min, which means a switching of traffic states per second or minute. For example, the number of times of switching of the service state per minute represented by 3/min is 3, that is, the service state corresponding to the target service interaction data is switched from the service state1 to the service state2 (1 time), then switched from the service state2 to the service state3 (2 times), and further switched from the service state3 to the service state1 (3 times) within each minute, or the service state corresponding to the target service interaction data is switched from the service state1 to the service state2 (1 time), then switched from the service state2 to the service state1 (2 times), and further switched from the service state1 to the service state3 (3 times).
And then, determining a service state switching frequency range corresponding to the target service interaction data based on a comparison result of the target service state switching frequency and a range value of a set frequency range. For example, the interval value may be an endpoint value of a set frequency range, for example, if the set frequency range is 2 to 15, the interval value may be 2/s and 15/s, or 2/min and 15/min. In some possible examples, before the step of determining the traffic state switching frequency range corresponding to the target traffic interaction data based on the comparison result of the target traffic state switching frequency and the interval value of the set frequency range, the method further includes: and detecting whether the target service state switching frequency meets a set state switching condition. Based on the above, the step of determining the service state switching frequency range corresponding to the target service interaction data based on the comparison result between the target service state switching frequency and the interval value of the set frequency range may include: and when the target service state switching frequency is detected to meet the set state switching condition, determining a service state switching frequency range corresponding to the target service interaction data based on a comparison result of the target service state switching frequency and an interval value of a set frequency range.
Further, in the above, the step of detecting whether the target service state switching frequency satisfies a set state switching condition includes: determining a first service state switching frequency corresponding to an interactive data track of at least one group of service interactive data, wherein the at least one group of service interactive data is a previous group of service interactive data of the target service interactive data, or N groups of service interactive data which are continuous in the front of the target service interactive data, the first service state switching frequency corresponding to the interactive data track of any group of service interactive data is a service state switching frequency corresponding to the number of first interactive data segments, and the number of the first interactive data segments is the maximum interactive data segment number in the interactive data tracks of the group of service interactive data corresponding to the number of interactive data segments corresponding to each service state switching frequency; detecting a comparison result of the interval value of the target service state switching frequency and a set frequency range to obtain a first detection result; detecting a comparison result of the interval value of each first service state switching frequency and the set frequency range to obtain a second detection result; detecting whether the first detection result is consistent with each obtained second detection result; if yes, judging that the target service state switching frequency meets the set state switching condition. In addition, the process for determining the switching frequency of the first service state corresponding to the interaction data track of any group of service interaction data comprises the following steps: counting a first number of interactive data segments having a service state switching frequency for each of a plurality of service state switching frequencies of an interactive data track of a set of service interactive data; sequencing a plurality of first numbers obtained by statistics according to the switching frequency of each service state of the interactive data track of the target service interactive data; after finishing sequencing, weighting a plurality of first numbers corresponding to the uninterrupted service state switching frequency including the first number aiming at each first number, and determining a weighting result as the number of interactive data fragments of the service state switching frequency corresponding to the first number; and determining the service state switching frequency corresponding to the determined maximum interactive data fragment number as the first service state switching frequency corresponding to the interactive data track of the group of service interactive data.
Finally, determining a second global service state switching frequency based on the determined service state switching frequency range; and adjusting the service state switching frequency of the interactive data track until the global service state switching frequency of the target service interactive data is the second global service state switching frequency, and identifying the target service interactive data according to the second global service state switching frequency to obtain a content identification result corresponding to the target service item content. For example, the second global service state switching frequency is different from the first global service state switching frequency, and the second global service state switching frequency emphasizes the loss of service data in the state switching process, so that the target service interaction data is identified based on the second global service state switching frequency, the content identification result can be ensured to carry useful service data as far as possible, and the loss of the service data during switching between different states is reduced.
Based on the above description of the step "identifying the content of the target service item according to the service requirement characteristics to obtain the content identification result corresponding to the content of the target service item", the identification of the target service interaction data can be realized from the corresponding relationship between the service data and the service state, so that the loss condition of the service data in the state switching process can be considered, the content identification result is ensured to carry useful service data as much as possible, and the loss of the service data in the switching between different states is reduced.
According to the data denoising method based on the big data and the cloud computing, the target service item content carrying noise is obtained, the target service item content is input to a noise filtering thread which is configured in advance, and the corresponding denoising service item content is obtained according to the output of the noise filtering thread. The noise filtering thread comprises a plurality of mutually associated service environment detection modules, item content production data of each content block category of target service item content is obtained through a data classification strategy of the service environment detection modules, the item content production data is used as input of an information identification strategy in the service environment detection modules, document digital information of each content block category is obtained through the information identification strategy, the document digital information is identified to obtain service item indication information of each content block category, and the item content production data of each content block category is respectively marked according to the service item indication information to obtain marked item content production data. The noise filtering thread keeps better noise data identification and filtering performance, fully identifies service requirements of different content block types to obtain corresponding service item indication information, processes content blocks with higher heat degree through marking, weakens cold content blocks or wrong content blocks, further effectively filters noise content in target service item content, ensures that the noise removing service item content can reversely deduce important content blocks of the target service item content, and improves content information reduction degree and service environment matching degree of the noise removing service item content.
In a related embodiment, a data denoising method based on big data and cloud computing is further provided, and in the embodiment, a noise filtering thread is a thread for filtering noise for the contents of the local portrait label service item. Further, the method content corresponding to this embodiment is as follows in eight steps.
In the first step, the content of the interactive service item carrying noise is obtained.
In this embodiment, the content of the interactive service item carrying noise refers to data including time-series continuous interactive status information. The interactive service item content may be content information for different online business services, such as online payment, online office, online education, and the like, and is not limited herein.
And secondly, determining the service item content of the portrait label according to the corresponding content information of the portrait label in the interactive service item content, and obtaining the target service item content with noise.
In some possible implementation manners, content positioning based on the portrait label is carried out on the content of the interactive service item, corresponding content information in the content of the interactive service item where the portrait label is located is determined, the content of the portrait label service item is determined based on the determined corresponding content information, and the determined content of the portrait label service item is used as the target service item content with noise.
Thirdly, inputting the content of the target service item into a noise filtering thread configured based on the balanced sample configuration file; the noise filtering thread comprises a plurality of mutually associated service environment detection modules.
And fourthly, obtaining item content production data of each content block category of the target service item content through a data classification strategy of the service environment detection module, wherein the item content production data is used as the input of an information identification strategy in the service environment detection module.
And fifthly, obtaining the document digital information of each content block type through an information identification strategy, and identifying the document digital information to obtain the service item indication information of each content block type.
And sixthly, respectively marking the item content production data of each content block type according to the service item indication information through a data marking strategy of the service environment detection module to obtain marked item content production data.
And seventhly, obtaining the de-noising service item content corresponding to the target service item content through the noise filtering thread and the marked item content production data.
And step eight, integrating the interactive service item content and the denoising service item content to obtain the interactive service item content after noise filtration.
In this embodiment, the denoising service item content is the portrait label service item content after noise filtering. Under some possible implementation modes, the interaction service item content is subjected to content positioning based on the portrait label, corresponding content information in the interaction service item content where the portrait label is located is determined, the portrait label service item content of the determined corresponding content information is replaced by the denoising service item content, and the interaction service item content after noise filtering is obtained. The noise filtering thread based on the portrait label service item content can enhance the processing of the portrait label related content by the thread, and further improve the noise filtering performance of the noise filtering thread.
It should be understood that, in the above eight steps, the related technical features can be referred to the description of fig. 2, and are not described in detail here.
In a related embodiment, before the step of inputting the target service item content into the noise filtering thread configured based on the balanced sample profile, the method further comprises: and carrying out global adjustment processing on the target service item content. After the step of obtaining the de-noised service item content corresponding to the target service item content through the noise filtering thread and the marked item content production data, the method further comprises the following steps: and carrying out item content expansion processing on the denoising service item content, and expanding the denoising service item content to the state of the target service item content. It is to be understood that, in this implementation, the step of inputting the target service item content to the noise filtering thread configured based on the balanced sample profile refers to inputting the target service item content after the global adjustment processing to the noise filtering thread configured based on the balanced sample profile. It can be understood that, when the denoising service item content is expanded, most of the noise content is cleaned, and therefore, when the state (content scale) of the denoising service item content is equivalent to the state (content scale) of the target service item content, the noise rate of the denoising service item content is much smaller than that of the target service item content, so that the denoising service item content can be expanded based on the content of the denoising service item content to meet a larger-scale data traffic mining scenario, and a certain signal-to-noise ratio can be ensured, thereby ensuring that the data content is as clean as possible.
The global adjustment processing refers to that the original service item content is subjected to batch processing based on the same mapping format and the same content element description information. By carrying out global adjustment, the compatibility problem of the service item contents can be avoided when the service item contents are jointly used or used in a correlated manner, and further the fidelity of the item contents is realized in the expansion process of the item contents. Further, the mapping format may be adjusted according to actual requirements, which is not limited herein, and the content element description information may also be adjusted according to actual requirements, which is not limited herein.
In a related embodiment, referring to fig. 3, a manner of configuring a noise filtering thread in a data denoising method based on big data and cloud computing is further provided, and specifically may include the following steps S31-S34.
S31, a first configuration sample file set formed by the service item content to be configured carrying noise and a second configuration sample file set formed by the service item content to be configured not carrying noise are obtained.
The first configuration sample file set is formed by a plurality of globally adjusted and noisy contents of service items to be configured (first configuration samples), correspondingly, the second configuration sample file set is formed by a plurality of globally adjusted and non-noisy contents of service items to be configured (second configuration samples), and the configuration samples in the first configuration sample file set and the configuration samples in the second configuration sample file set are in one-to-one correspondence, which only differs in whether noise is carried. In some possible implementation manners, further, the second configuration sample may be service item content to be configured that is obtained through each service item content obtaining approach and does not carry noise, or obtained by converting the obtained service item content to be configured that does not carry noise, and the first configuration sample may be obtained by performing noise processing on the second configuration sample; the first configuration sample and the second configuration sample may also be a large number of service item content samples collected by service interaction terminals corresponding to the interactive service item content, for example, the corresponding service item content samples are collected by service interaction terminals such as a mainframe computer and a PC. It can be understood that when the configured noise filtering thread is a thread for filtering noise for the content of the digitized interactive service item of the document, the sample is configured as the content of the digitized interactive service item of the document; and when the configured noise filtering thread is a thread for filtering noise aiming at the local portrait label service item content, configuring the sample as the local portrait label service item content. Thread configuration based on the portrait label service item content can enhance the processing of portrait label related content by threads, and further improve the noise filtering performance of the noise filtering thread.
And S32, inputting the first configuration sample file set into a data processing thread which is corresponding to the balanced sample configuration file and comprises a plurality of mutually associated service environment detection modules, and obtaining a transition sample file set after noise filtering.
The transition sample file set refers to a set formed by intermediate samples corresponding to the first configuration samples. Further, the intermediate sample refers to the content of the interactive service item obtained after the data processing thread performs denoising processing on the first configuration sample.
In some possible implementation manners, the first configuration sample in the first configuration sample file set is sequentially input to a data processing thread corresponding to the balanced sample configuration file and including a plurality of mutually associated service environment detection modules, and through a data classification policy of the service environment detection modules, item content production data of each content block category of the first configuration sample is sequentially obtained and used as input of an information identification policy in the service environment detection modules. And further, according to a data marking strategy of the service environment detection module, marking the item content production data of each content block type respectively according to the service item indication information to obtain marked item content production data corresponding to the first configuration sample. And further processing the marked project content production data corresponding to the first configuration sample based on the data processing thread to finally obtain an intermediate sample corresponding to the first configuration sample, wherein all the intermediate samples form a transition sample file set.
And S33, inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining current performance state information according to the output of the performance detection thread.
The performance state information is a local index for evaluating the denoising effect of the noise removal thread, and generally, the smaller the loss value corresponding to the performance state information is, the better the denoising effect of the noise removal thread is represented. Correspondingly, the current performance state information is a global index for evaluating the noise filtering effect of the data processing thread, and each item of thread configuration data in the data processing thread is adjusted based on the current performance state information to achieve a better noise filtering effect. In this embodiment, a corresponding current performance state information is generated based on different intermediate samples.
As mentioned above, the balanced sample profile configuration refers to a set of data obtained by the data processing thread to test the performance testing thread, and then the performance testing thread tests the set of data and the corresponding testing result of the complete data (i.e. the service item content without performing noise removal). It can be understood that, in the present embodiment, the purpose of balancing the sample profile configuration is to enable the intermediate samples obtained by the data processing threads to achieve the effect of thread update optimization of the countermeasure property. In other words, it is difficult for the performance detection thread to distinguish whether the intermediate sample is the service item content after denoising or the service item content before denoising.
Under some possible implementation modes, respectively inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and adjusting thread configuration data of the performance detection thread according to the output of the performance detection thread to obtain an updated performance detection thread; and inputting the transition sample file set into the updated performance detection thread, obtaining current performance state information according to the output of the updated performance detection thread, and adjusting thread configuration data of the data processing thread according to the current performance state information. The thread configuration data of the data processing thread refers to input/output transfer information or output/input association information between the service function modules in the data processing thread. For example, the input/output conversion information between the functional modules 1 and 2, the use sequence of some information between the functional modules 3 and 5, and the like, which are not limited herein.
And S34, updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, returning to the step S32 until an iteration termination condition is met, and taking the updated data processing thread as a noise filtering thread.
In this embodiment, the thread configuration data of the data processing thread is adjusted according to the current performance state information and the set thread configuration data adjustment method of the data processing thread, so as to obtain an updated data processing thread. Judging whether a set iteration termination condition is met, if so, ending the iteration configuration, and taking the updated data processing thread as a noise filtering thread; otherwise, returning to the step S32, and when the set iteration termination condition is satisfied, taking the updated data processing thread as a noise filtering thread.
The thread configuration data adjusting method of the data processing thread includes, but is not limited to, a gradient correction method, a feedback correction method and other loss function correction algorithms. The iteration termination condition may be that the updated iteration accumulated value reaches the iteration accumulated threshold, or that the data processing thread reaches the set noise filtering effect, which is not limited herein. And the set noise filtering effect can be designed based on two aspects. The first aspect is the ratio of the content size before and after the noise removal, and the second aspect is the service adaptation after the noise removal.
For example, if the content size of service item content R1 before denoising is 1000mb and the content size of service item content R2 after denoising is 400mb, the denoising ratio may be 0.4, and if the denoising detection ratio is set to 0.6, the denoising process for service item content R1 is not satisfactory. For another example, the global content features of the service item content R2 after denoising may be extracted to form a feature matrix, then the similarity with the feature matrix of the current service demand content is calculated, and whether the service adaptability after denoising meets the standard is determined by the similarity, for example, the similarity between the feature matrix of the service item content R2 and the feature matrix of the current service demand content is n%, and if the similarity is set to be m% and n% < m%, it may be determined that the denoising process for the service item content R1 does not meet the requirement. It can be understood that the noise filtering efficiency should consider the above two cases at the same time, that is, only if the determination condition for setting the noise-removing detection ratio and the determination condition for setting the similarity are satisfied at the same time, the data processing thread can be considered to achieve the set noise filtering effect.
By the configuration mode of the noise filtering thread, the data processing thread comprising a plurality of mutually associated service environment detection modules and a performance detection thread are adopted for configuration detection, so that the data processing thread capable of effectively filtering noise is obtained and used as the noise filtering thread. And simultaneously, identifying the document digital information of each content block type corresponding to the input configuration sample based on a service environment detection module to obtain service item indication information of each content block type, and respectively marking the project content production data of each content block type according to the service item indication information to obtain corresponding marked project content production data. The content blocks with higher heat degree are marked and processed, meanwhile, the cold content blocks or the wrong content blocks are weakened, noise in each first configuration sample in the first configuration sample file set can be effectively filtered, meanwhile, the key content blocks corresponding to the first configuration samples can be restored based on the intermediate samples, and further the content information restoration degree and the service environment matching degree of the intermediate samples are improved.
In a related embodiment, the step of inputting the transition sample file set and the second configuration sample file set to the performance detection threads corresponding to the balance sample configuration files, and obtaining the current performance state information according to the output of the performance detection threads specifically includes the following steps S41-S43:
and S41, inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining denoising performance state information according to the output of the performance detection thread.
The denoising performance state information is index evaluation information used for evaluating the classification processing performance of the performance detection threads, and various thread configuration data in the performance detection threads are adjusted based on the denoising performance state information so as to achieve more accurate classification processing performance. In this embodiment, a corresponding denoising performance state information is generated based on different intermediate samples.
In some possible implementation manners, sequentially inputting each intermediate sample in the transition sample file set and each second configuration sample in the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, respectively obtaining outputs corresponding to each intermediate sample and each second configuration sample, and obtaining denoising performance state information according to the intermediate samples and the outputs of the corresponding second configuration samples, wherein the number of the denoising performance state information is the same as the number of the intermediate samples.
And S42, updating the thread configuration data of the performance detection thread according to the denoising performance state information to obtain an updated performance detection thread.
The thread configuration data of the performance detection thread refers to input/output transfer information or output/input association information between the service function modules in the performance detection thread. In this embodiment, the thread configuration data of the performance detection thread is adjusted according to the denoising performance state information and the set method for adjusting the configuration data of the performance detection thread, so as to obtain an updated performance detection thread. The method for adjusting the configuration data of the performance detection thread includes, but is not limited to, a gradient correction method, a feedback correction method and other correction algorithms.
And S43, inputting the transition sample file set into the updated performance detection thread, and obtaining the current performance state information according to the output of the updated performance detection thread.
After the updated performance detection thread is obtained, the current performance detection thread has better classification processing performance compared with the performance detection thread before updating. Therefore, after the performance detection thread has better classification processing performance, the thread configuration data of the performance detection thread is locked, and then the data processing thread is configured.
In some possible implementation manners, all the intermediate samples in the transition sample file set are sequentially input into the updated performance detection threads, each intermediate sample corresponds to the output of one updated performance detection thread, and the current performance state information is obtained according to the output of the updated performance detection threads.
In this embodiment, first, the thread configuration data of the data processing thread is locked, and the performance detection thread is configured and updated, so that the performance of the classification processing is maintained by the configured performance detection thread. After the performance detection thread is configured, the data processing thread is configured and updated, at the moment, the thread configuration data of the performance detection thread is locked and is not changed, only the deviation result or the comparison result generated by the data processing thread is transmitted to the data processing thread, namely, the current performance state information is obtained according to the output of the updated performance detection thread, and the thread configuration data of the data processing thread is updated based on the current performance state information. The two noise-removal threads finally reach a stable state or a convergence state through mutual feedback training (anti-training) between the performance detection thread and the data processing thread.
In a related embodiment, the step of inputting the transition sample file set and the second configuration sample file set to the performance detection thread, and obtaining denoising performance state information according to the output of the performance detection thread includes: respectively inputting the transition sample file set and the second configuration sample file set into a performance detection thread to obtain a first performance detection result corresponding to the transition sample file set and a second performance detection result corresponding to the second configuration sample file set; and obtaining denoising performance state information according to the first performance detection result and the second performance detection result and by combining the denoising thread mapping relation.
The first performance test result and the second performance test result respectively refer to performance test results of the intermediate sample and the second configuration sample belonging to the configuration sample but not the intermediate sample. Assuming that the sample difference label of the middle sample is set to x1 and the sample difference label of the second configuration sample is set to x2, the output of the performance testing thread is an output result value between x1 and x2, that is, the performance testing evaluation index range of the first performance testing result and the second performance testing result is x1 to x 2. The purpose of the performance testing thread configuration is to make the corresponding first performance testing result of the intermediate sample as close to x1 as possible, and make the corresponding second performance testing result of the second configuration sample as close to x2 as possible, so as to obtain accurate classification processing performance. The performance detection result may be represented in a form of probability.
The denoise thread mapping relationship is a function that computes performance state information of the performance detection threads according to the output of the performance detection threads. For example, based on the mapping relation function of cross entropy, the related formulas are not listed here.
Under some possible implementation modes, sequentially inputting each intermediate sample and a sample distinguishing label thereof in the transition sample file set, and each second configuration sample and a sample distinguishing label thereof in the second configuration sample file set to a performance detection thread to obtain a first performance detection result corresponding to the transition sample file set and a second performance detection result of the second configuration sample file set; and obtaining denoising performance state information according to the first performance detection result and the second performance detection result and by combining the denoising thread mapping relation.
In a related embodiment, the step of inputting the transition sample file set to the updated performance detection thread and obtaining the current performance state information according to the output of the updated performance detection thread includes: inputting the transition sample file set into the updated performance detection thread to obtain a third performance detection result corresponding to the transition sample file set; and obtaining the current performance state information according to the third performance detection result and by combining the mapping relation of the balance samples.
Wherein, the third performance test result refers to the performance test result of the intermediate sample belonging to the configuration sample but not the intermediate sample. The balanced sample mapping relationship refers to a function that computes performance state information for a data processing thread based on the output of the data processing thread. For example, based on the mapping relationship function of the cross entropy, it can be understood that the balanced sample mapping relationship and the denoising thread mapping relationship may be the same or different, and are not limited herein.
Under some possible implementation modes, sequentially inputting each intermediate sample and the sample distinguishing label thereof in the transition sample file set to a performance detection thread to obtain a third performance detection result corresponding to the transition sample file set; and obtaining the current performance state information according to the third performance detection result and by combining the mapping relation of the balance samples. In contrast to the performance detection thread configuration, in the present embodiment, the sample discrimination label of the intermediate sample is set to x2, so as to serve the purpose of active interference, and thus the intermediate sample can be gradually iterated to be the complete second configuration sample.
In a related embodiment, the manner in which the noise filtering thread is configured further includes the configuration of the comparison thread for the block of content. Further, updating thread configuration data of the data processing thread according to the current performance state information, and before obtaining the updated data processing thread, the method further includes: and respectively inputting the transition sample file set and the second configuration sample file set into a content block comparison thread to obtain a content block comparison result between the transition sample file set and the second configuration sample file set. Updating thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, comprising: and updating the thread configuration data of the data processing thread according to the current performance state information and the content block comparison result to obtain the updated data processing thread.
The content block comparison result refers to difference information of the intermediate sample and the corresponding second configuration sample in the content block semantic information level. It can be understood that the content block comparison result between the transition sample file set and the second configuration sample file set refers to difference information of each intermediate sample in the transition sample file set and the corresponding second configuration sample in the semantic information level of the content block.
In some possible implementation manners, each intermediate sample and the corresponding second configuration sample in the transition sample file set are sequentially input to a content block comparison thread, the content block comparison thread extracts the content blocks of the intermediate samples and the corresponding second configuration samples, and the content blocks are compared and analyzed to obtain a content block comparison result between each intermediate sample and the corresponding second configuration sample. And adjusting the thread configuration data of the data processing thread according to the comparison result of the current performance state information and the content block and the set thread configuration data adjusting method of the data processing thread to obtain the updated data processing thread. For example, according to the current performance state information and the comparison result of the content block, the thread configuration data of the data processing thread is adjusted in stages by adopting a gradient descent algorithm.
By analyzing the content block comparison result of the intermediate sample and the second configuration sample corresponding to the intermediate sample, the finally obtained denoising service item content output by the noise filtering thread is prompted to further maintain information with higher discrimination, namely, the related key content block of the target service item content is more accurately restored, so that the content information restoration degree of the denoising service item content is effectively improved, and the data denoising accuracy based on the interactive recognition level is ensured in the application scene or service environment of the interactive recognition.
In another embodiment, updating the thread configuration data of the data processing thread according to the current performance status information, and before obtaining the updated data processing thread, the method further includes: and analyzing the content elements of the transition sample file set and the second configuration sample file set to obtain a content element comparison result between the transition sample file set and the second configuration sample file set. Updating thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, comprising: and updating the thread configuration data of the data processing thread according to the current performance state information and the content element comparison result to obtain the updated data processing thread.
The content element comparison result refers to difference information existing in each content element set of the intermediate sample and the corresponding second configuration sample. It can be understood that the content element comparison result between the transition sample file set and the second configuration sample file set refers to difference information existing in the content element level between each intermediate sample in the transition sample file set and the corresponding second configuration sample. Further, content elements may refer to content information of different dimensions in the service content, which may be adjusted according to different differentiation criteria, such as content elements analyzed according to office dimensions, which may include content element 1 "office in place element" and content element 2 "remote office element", etc. For another example, the content elements are analyzed according to life dimensions, and may include a content element 3 "office element" and a content element 4 "amateur element". It is to be understood that examples regarding content elements are not limited to the above examples.
Under some possible implementation manners, the comparison result analysis is sequentially performed on the content element sets of each intermediate sample and the corresponding second configuration sample in the transition sample file set, so as to obtain the content element comparison result between each intermediate sample and the corresponding second configuration sample. And adjusting the thread configuration data of the data processing thread according to the comparison result of the current performance state information and the content element and the set thread configuration data adjusting method of the data processing thread to obtain the updated data processing thread. For example, the data processing thread configuration data is adjusted by adopting a previous feedback correction algorithm according to the current performance state information and the content element comparison result.
In a related embodiment, updating the thread configuration data of the data processing thread according to the current performance state information, and before obtaining the updated data processing thread, further including: analyzing the content elements of the transition sample file set and the second configuration sample file set to obtain a content element comparison result between the transition sample file set and the second configuration sample file set; respectively inputting the transition sample file set and the second configuration sample file set into a content block comparison thread to obtain a content block comparison result between the transition sample file set and the second configuration sample file set; updating thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, comprising: and updating the thread configuration data of the data processing thread according to the current performance state information, the content element comparison result and the content block comparison result to obtain an updated data processing thread.
By analyzing the content block comparison result and the content element comparison result of the intermediate sample and the second configuration sample corresponding to the intermediate sample, the content restoration degree of the de-noised service item content restored by the finally obtained noise filtering thread can be ensured, and partial deletion of the service item content is avoided.
In a related embodiment, step S34 further includes the following steps S341-S344.
And S341, updating the thread configuration data of the data processing thread according to the current performance state information to obtain the updated data processing thread.
And S342, acquiring the current update iteration accumulated value.
And S343, when the updated iterative cumulative value is smaller than the set iterative cumulative threshold, returning to the data processing thread which inputs the first configuration sample file set to the balanced sample configuration file and corresponds to the service environment detection modules and comprises a plurality of correlated service environment detection modules, and obtaining a transition sample file set for filtering noise.
And S344, when the updated iteration accumulated value reaches the set iteration accumulated threshold value, taking the updated data processing thread as a noise filtering thread.
In this embodiment, each time the configuration of the balanced sample configuration file is completed, a self-add 1 operation is performed on the updated iterative cumulative value, the current updated iterative cumulative value is obtained, whether the current updated iterative cumulative value reaches the iterative cumulative threshold value is judged, and if not, the relevant steps of the configuration are continuously executed; otherwise, the updated data processing thread is used as a noise filtering thread, and the configuration step is exited.
In a related embodiment, step S34 is followed by a step of testing a noise filtering thread, which specifically includes: acquiring a test sample file set formed by the content of a test service item carrying noise; and inputting the test sample file set to the configured noise filtering thread, and obtaining a denoising test result according to the output of the noise filtering thread. The test sample file set is formed by a plurality of test service item contents (test samples) which are subjected to global adjustment processing and carry noise, and the test service item contents are different from the first service item contents to be configured. And further testing the performance of the configured noise filtering thread to determine whether the currently obtained noise filtering thread meets the set noise filtering effect.
According to the data denoising method based on the big data and the cloud computing and the cloud server, the target service item content carrying noise is obtained, the target service item content is input to the noise filtering thread which is obtained in advance, and the corresponding denoising service item content is obtained according to the output of the noise filtering thread. The noise filtering thread comprises a plurality of mutually associated service environment detection modules, item content production data of each content block category of target service item content is obtained through a data classification strategy of the service environment detection modules, the item content production data is used as input of an information identification strategy in the service environment detection modules, document digital information of each content block category is obtained through the information identification strategy, the document digital information is identified to obtain service item indication information of each content block category, and the item content production data of each content block category is respectively marked according to the service item indication information to obtain marked item content production data. It can be understood that the noise filtering thread can maintain better noise data identification and filtering performance, fully identify service requirements of different content block categories to obtain corresponding service item indication information, process content blocks with higher heat degree through marking, weaken cold content blocks or wrong content blocks, further effectively filter noise content in target service item content, simultaneously ensure that denoising service item content can reversely deduce important content blocks of the target service item content, and improve content information reduction degree and service environment matching degree of the denoising service item content.
An exemplary data denoising device based on big data and cloud computing is further provided in the embodiments of the present invention, and as shown in fig. 4, the data denoising device 400 based on big data and cloud computing may include the following functional modules.
An obtaining module 410, configured to obtain the content of the target service item carrying noise.
A determining module 420, configured to determine service item indication information corresponding to the target service item content based on a pre-trained noise filtering thread.
And the denoising module 430 is configured to obtain marked project content production data corresponding to the target service project content according to the service project indication information, and implement denoising processing on the target service project content in combination with the noise filtering thread.
It is understood that the descriptions of the above obtaining module 410, determining module 420 and denoising module 430 can be referred to the description of the above method embodiments.
Further, referring to fig. 5 in combination, the cloud server 120 may include a processing engine 1201, a network module 1202, and a memory 1203, the processing engine 1201 and the memory 1203 communicating through the network module 1202. The processing engine 1201 may process relevant information and/or data to perform one or more functions described herein. The network module 1202 may facilitate the exchange of information and/or data. The memory 1203 is used for storing a program, and the processing engine 1201 executes the program after receiving an execution instruction. It is to be understood that the configuration shown in fig. 5 is merely illustrative, and that cloud server 120 may include more or fewer components than shown in fig. 5, or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.
It should be appreciated that the system and its modules shown above may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).
It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.
Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.
Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.
Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.
The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.
Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).
Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.
Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.
Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the numbers allow for adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.
The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.
Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims (10)

1. A data denoising method based on big data and cloud computing is applied to a cloud server, and comprises the following steps:
acquiring a first configuration sample file set formed by the contents of the service items to be configured carrying noise and a second configuration sample file set formed by the contents of the service items to be configured not carrying noise;
inputting the first configuration sample file set to a data processing thread corresponding to a balanced sample configuration file and comprising a plurality of mutually associated service environment detection modules, and obtaining a transition sample file set after noise filtration;
respectively inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining current performance state information according to the output of the performance detection thread;
updating thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, returning to a transition sample file set which is obtained by inputting the first configuration sample file set to a data processing thread which corresponds to a balanced sample configuration file and comprises a plurality of mutually associated service environment detection modules to obtain a noise filtering until an iteration termination condition is met, and taking the updated data processing thread as a noise filtering thread.
2. The method according to claim 1, wherein the respectively inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining current performance state information according to an output of the performance detection thread, comprises:
respectively inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining denoising performance state information according to the output of the performance detection thread;
updating the thread configuration data of the performance detection thread according to the denoising performance state information to obtain an updated performance detection thread;
and inputting the transition sample file set into the updated performance detection thread, and obtaining current performance state information according to the output of the updated performance detection thread.
3. The method according to claim 2, wherein the separately inputting the transition sample file set and the second configuration sample file set to a performance detection thread corresponding to the balance sample configuration file, and obtaining denoising performance state information according to an output of the performance detection thread, comprises:
respectively inputting the transition sample file set and the second configuration sample file set to a performance detection thread to obtain a first performance detection result corresponding to the transition sample file set and a second performance detection result of the second configuration sample file set;
and obtaining denoising performance state information according to the first performance detection result and the second performance detection result and by combining a denoising thread mapping relation.
4. The method of claim 3, wherein inputting the transition sample file set to the updated performance instrumentation thread and deriving current performance state information from an output of the updated performance instrumentation thread comprises:
inputting the transition sample file set to the updated performance detection thread to obtain a third performance detection result corresponding to the transition sample file set;
and obtaining the current performance state information according to the third performance detection result and by combining the mapping relation of the balance samples.
5. The method of claim 1, wherein before updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, further comprising: respectively inputting the transition sample file set and the second configuration sample file set into a content block comparison thread to obtain a content block comparison result between the transition sample file set and the second configuration sample file set;
updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, including: and updating the thread configuration data of the data processing thread according to the current performance state information and the content block comparison result to obtain an updated data processing thread.
6. The method of claim 1, wherein before updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, further comprising: analyzing the content elements of the transition sample file set and the second configuration sample file set to obtain a content element comparison result between the transition sample file set and the second configuration sample file set;
updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, including: updating thread configuration data of the data processing thread according to the current performance state information and the content element comparison result to obtain an updated data processing thread;
wherein, the updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread, returning to the transition sample file set for filtering noise, which is obtained by inputting the first configuration sample file set to the data processing thread corresponding to the balanced sample configuration file and including the plurality of mutually associated service environment detection modules, until an iteration termination condition is satisfied, and using the updated data processing thread as a noise filtering thread, includes:
updating the thread configuration data of the data processing thread according to the current performance state information to obtain an updated data processing thread;
acquiring a current update iteration accumulated value; when the updated iterative accumulated value is smaller than the set iterative accumulated threshold value, returning to the data processing thread which inputs the first configuration sample file set to a balanced sample configuration file and corresponds to the data processing thread and comprises a plurality of mutually associated service environment detection modules, and obtaining a transition sample file set for filtering noise;
and when the updated iteration accumulated value reaches a set iteration accumulated threshold value, taking the updated data processing thread as a noise filtering thread.
7. The method of claim 1, further comprising:
acquiring the content of a target service item carrying noise; correspondingly: the service item content comprises a series of business data as input to a data mining algorithm;
determining service item indication information corresponding to the target service item content based on the noise filtering thread; correspondingly: the service item indication information is guiding information aiming at different content block categories;
and obtaining marked project content production data corresponding to the target service project content according to the service project indication information, and realizing denoising processing on the target service project content by combining the noise filtering thread.
8. The method of claim 7, wherein determining service item indication information corresponding to the target service item content based on the noise filtering thread comprises:
inputting the target service item content to a noise filtering thread configured based on a balanced sample profile; the noise filtering thread comprises a plurality of mutually associated service environment detection modules;
obtaining project content production data of each content block category of the target service project content through a data classification strategy of the service environment detection module, wherein the project content production data is used as the input of an information identification strategy in the service environment detection module;
and obtaining the document digital information of each content block type through the information identification strategy, and identifying the document digital information to obtain the service item indication information of each content block type.
9. The method of claim 8, wherein obtaining tagged item content production data corresponding to the target service item content according to the service item indication information, and implementing denoising processing on the target service item content in combination with the noise filtering thread comprises:
respectively marking the project content production data of each content block category according to the service project indication information through a data marking strategy of the service environment detection module to obtain marked project content production data;
and obtaining the de-noising service item content corresponding to the target service item content through the noise filtering thread and the marked item content production data.
10. A cloud server comprising a processing engine, a network module, and a memory; the processing engine and the memory communicate through the network module, the processing engine reading a computer program from the memory and operating to perform the method of any of claims 1-9.
CN202110905009.7A 2021-02-06 2021-02-06 Data denoising method based on big data and cloud computing and cloud server Withdrawn CN113609117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110905009.7A CN113609117A (en) 2021-02-06 2021-02-06 Data denoising method based on big data and cloud computing and cloud server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110905009.7A CN113609117A (en) 2021-02-06 2021-02-06 Data denoising method based on big data and cloud computing and cloud server
CN202110165552.8A CN112860675B (en) 2021-02-06 2021-02-06 Big data processing method under online cloud service environment and cloud computing server

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202110165552.8A Division CN112860675B (en) 2021-02-06 2021-02-06 Big data processing method under online cloud service environment and cloud computing server

Publications (1)

Publication Number Publication Date
CN113609117A true CN113609117A (en) 2021-11-05

Family

ID=75988819

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202110905010.XA Withdrawn CN113609118A (en) 2021-02-06 2021-02-06 Data optimization method applied to big data and big data server
CN202110165552.8A Active CN112860675B (en) 2021-02-06 2021-02-06 Big data processing method under online cloud service environment and cloud computing server
CN202110905009.7A Withdrawn CN113609117A (en) 2021-02-06 2021-02-06 Data denoising method based on big data and cloud computing and cloud server

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN202110905010.XA Withdrawn CN113609118A (en) 2021-02-06 2021-02-06 Data optimization method applied to big data and big data server
CN202110165552.8A Active CN112860675B (en) 2021-02-06 2021-02-06 Big data processing method under online cloud service environment and cloud computing server

Country Status (1)

Country Link
CN (3) CN113609118A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113918985B (en) * 2021-09-10 2023-07-18 广州博依特智能信息科技有限公司 Security management policy generation method and device
CN114496299B (en) * 2022-04-14 2022-06-21 八爪鱼人工智能科技(常熟)有限公司 Epidemic prevention information processing method based on deep learning and epidemic prevention service system
CN115391810B (en) * 2022-09-23 2023-06-30 成都坐联智城科技有限公司 Data hierarchical encryption method and AI system based on big data

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886670B2 (en) * 2014-06-30 2018-02-06 Amazon Technologies, Inc. Feature processing recipes for machine learning
CN110162556A (en) * 2018-02-11 2019-08-23 陕西爱尚物联科技有限公司 A kind of effective method for playing data value
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
US10990470B2 (en) * 2018-12-11 2021-04-27 Rovi Guides, Inc. Entity resolution framework for data matching
CN112511543A (en) * 2020-04-10 2021-03-16 吴萌萌 Network security analysis method and system based on big data platform and big data platform
CN111698232B (en) * 2020-06-03 2021-09-10 腾讯科技(深圳)有限公司 Data processing method, data processing device, computer equipment and storage medium
CN111984898A (en) * 2020-06-29 2020-11-24 平安国际智慧城市科技股份有限公司 Label pushing method and device based on big data, electronic equipment and storage medium
CN111967375A (en) * 2020-08-14 2020-11-20 云粒智慧科技有限公司 Service method based on figure portrait
CN112199395A (en) * 2020-10-13 2021-01-08 吴俊� Artificial intelligence analysis method and system

Also Published As

Publication number Publication date
CN112860675B (en) 2021-10-26
CN112860675A (en) 2021-05-28
CN113609118A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN112860675B (en) Big data processing method under online cloud service environment and cloud computing server
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
EP3620982A1 (en) Sample processing method and device
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN114757432A (en) Future execution activity and time prediction method and system based on flow log and multi-task learning
CN113377909A (en) Paraphrase analysis model training method and device, terminal equipment and storage medium
CN115953123A (en) Method, device and equipment for generating robot automation flow and storage medium
CN114254146A (en) Image data classification method, device and system
CN111488939A (en) Model training method, classification method, device and equipment
CN114428860A (en) Pre-hospital emergency case text recognition method and device, terminal and storage medium
CN112256881B (en) User information classification method and device
CN111666748B (en) Construction method of automatic classifier and decision recognition method
EP3893146A1 (en) An apparatus for determining a classifier for identifying objects in an image, an apparatus for identifying objects in an image and corresponding methods
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
CN115204322B (en) Behavior link abnormity identification method and device
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN110705631A (en) SVM-based bulk cargo ship equipment state detection method
CN116738330A (en) Semi-supervision domain self-adaptive electroencephalogram signal classification method
CN112115996B (en) Image data processing method, device, equipment and storage medium
CN115438239A (en) Abnormity detection method and device for automatic abnormal sample screening
Daza et al. An algorithm for detecting noise on supervised classification
CN114299340A (en) Model training method, image classification method, system, device and medium
CN117292404B (en) High-precision gesture data identification method, electronic equipment and storage medium
CN116431757B (en) Text relation extraction method based on active learning, electronic equipment and storage medium
CN114510715B (en) Method and device for testing functional safety of model, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211105

WW01 Invention patent application withdrawn after publication