CN110837595A

CN110837595A - Enterprise information data processing method, system, terminal and storage medium

Info

Publication number: CN110837595A
Application number: CN201911069207.3A
Authority: CN
Inventors: 张雄君; 邹戈阳; 乔佳; 王维婷; 金文龙; 刘兴伟; 石书强; 常旭宁; 武凤阳; 王倩微; 王勋; 朱瑞娟; 孙俊芳
Original assignee: Beijing Gas Group Co Ltd
Current assignee: Beijing Gas Group Co Ltd
Priority date: 2019-11-05
Filing date: 2019-11-05
Publication date: 2020-02-25

Abstract

The application provides an enterprise information data processing method, system, terminal and storage medium, wherein the method comprises the following steps: acquiring webpage data updated by a target website within preset time, screening the acquired webpage data and storing the webpage data in a database; analyzing the data of the database according to a preset analysis strategy to obtain analysis data related to the target website; displaying the analysis data in a webpage form; the method and the device solve the problems that in the prior art, an information data search engine is high in search repetition rate, limited in coverage content, poor in timeliness, time-consuming in manual downloading, storing and displaying of information data, low in efficiency and the like, and can achieve automatic collection, typesetting, storing and displaying of the information data.

Description

Enterprise information data processing method, system, terminal and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for processing enterprise information data.

Background

In the gas industry, a department responsible for information data collection provides a large amount of market information, technical progress, various data services and the like for the development of a gas enterprise, and the issued scientific and technical information product favorably supports the daily operation activities of the enterprise and provides a lot of references with important values for the development planning and strategic decision making of companies.

Under the prior art, the collected information data of the gas enterprise is often manually read and browsed on hundreds of degrees, designated websites or WeChat public numbers, and valuable contents and data in the information are manually judged and manually downloaded and stored. The search engine is adopted, so that the search result repetition rate is high, the time cost for manually reading and screening valuable contents and data is too high, the covered websites and space are limited, the time for publishing the search result cannot be guaranteed to be the latest time node, and valuable information in professional websites can be submerged in unobtrusive columns. Valuable data is stored and displayed by means of manual input, time is wasted, workload of information staff is increased, and labor cost of enterprises is increased.

Therefore, a method, a system, a terminal and a storage medium for processing enterprise information data are needed to automatically collect, typeset, store and display the information data.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides an enterprise information data processing method, an enterprise information data processing system, an enterprise information data processing terminal and a storage medium, and solves the problems that in the prior art, an information data search engine is high in search repetition rate, limited in coverage content, poor in timeliness, time-consuming in manual downloading, storing and displaying of information data, low in efficiency and the like.

To solve the above technical problem, in a first aspect, the present application provides an enterprise information data processing method, including:

acquiring webpage data updated by a target website within preset time, screening the acquired webpage data and storing the webpage data in a database;

analyzing the data of the database according to a preset analysis strategy to obtain analysis data related to the target website;

and displaying the analysis data in a webpage form.

Preferably, the acquiring webpage data updated by the target website within a preset time, and screening and storing the acquired webpage data to the database includes:

collecting webpage data updated by a target website within preset time and corresponding browsing amount;

repeatedly filtering and screening the acquired webpage data, and screening the value degree of the webpage data according to whether the browsing volume of the webpage data reaches a preset value or not;

and docking the screened webpage data according to a standard format and storing the webpage data into a MySQL database.

More preferably, the collecting webpage data and corresponding browsing volume updated by the target website within a preset time includes:

the method comprises the steps of respectively realizing web crawler data acquisition, providing database API interface website data acquisition and adopting mail server website data acquisition through an automatic test technology, an ODBC technology and an SMTP protocol.

Preferably, the acquiring webpage data updated by the target website within a preset time, and performing screening processing on the acquired webpage data and storing the acquired webpage data in the database further includes:

and classifying the data of the database according to the theme, and browsing the database information by butting an external webpage with the database.

Preferably, the analyzing the data in the database according to a preset analysis policy to obtain analysis data related to the target website includes:

and performing text abstract extraction, hot word analysis, hot news analysis, influence factor analysis and price prediction analysis on the data of the database to respectively obtain text abstract, hot words, hot news, influence factors and price prediction data related to the target website.

Preferably, the displaying the analysis data in the form of a web page includes:

and respectively displaying the text abstract, the hot words, the hot news, the influencing factors and the price prediction data in a webpage form.

In a second aspect, the present application provides an enterprise information data processing system, comprising:

the acquisition unit is configured to acquire webpage data updated by a target website within preset time, and screen and store the acquired webpage data in a database;

the analysis unit is configured to analyze and process the data of the database according to a preset analysis strategy to obtain analysis data related to the target website;

and the display unit is configured to display the analysis data in a webpage form.

Preferably, the acquiring unit includes:

the data collection unit is configured for collecting webpage data updated by a target website within preset time and corresponding browsing amount;

the data screening unit is configured for repeatedly filtering and screening the acquired webpage data and screening the value degree of the webpage data according to whether the browsing amount of the webpage data reaches a preset value or not;

and the data storage unit is used for butting the screened webpage data according to a standard format and storing the webpage data into the MySQL database.

More preferably, the collecting unit is specifically configured to:

Preferably, the system further comprises:

and the data browsing unit is configured to classify the data of the database according to the theme and browse the database information by butting an external webpage with the database.

Preferably, the analysis unit is specifically configured to:

Preferably, the display unit is specifically configured to:

In a third aspect, the present application provides a data processing terminal, including:

a memory for storing a computer program;

and the processor is used for realizing the enterprise information data processing method when executing the computer program.

In a fourth aspect, the present application provides a computer storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of the above aspects.

Compared with the prior art, the method has the following beneficial effects:

according to the method, the data acquisition, the information analysis of the text and the data and the webpage display are organically combined, and aiming at the problem of high repetition rate of search results of a search engine, the crawler program is compiled, valuable contents and data updated in a professional website on the same day are collected, the repeated contents are automatically filtered, the valuable contents are screened and stored in the Mysql database, and the automatic collection and storage of information and data are realized; meanwhile, the collected information is classified according to the theme, and the enterprise intranet users can browse valuable news and information under different themes by butting the external webpage with the database; in addition, the application can further generate the summary information from the collected news and information, so that scientific research personnel can analyze the news and the information conveniently. And multi-level analysis and display are carried out on the webpage data, so that the problems that in the prior art, an information data search engine is high in search repetition rate, limited in coverage content, poor in timeliness, time-consuming in manual downloading, storing and displaying of information data, low in efficiency and the like are solved, and the working efficiency of scientific research personnel is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart illustrating an enterprise information data processing method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a text abstract extraction method according to an embodiment of the present application;

fig. 3 is a flowchart of a hot word analysis method according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a hot news analysis method according to an embodiment of the present application;

FIG. 5 is a flowchart of a method for analyzing influence factors according to an embodiment of the present disclosure;

FIG. 6 is a flow chart of a price prediction analysis method provided in an embodiment of the present application;

fig. 7 is a news display interface provided in an embodiment of the present application;

FIG. 8 is a news search interface provided in an embodiment of the present application;

FIG. 9 is a block diagram of a text summarization interface according to an embodiment of the present application;

FIG. 10 is a block diagram of a hot spot word analysis interface provided by embodiments of the present application;

fig. 11 is a hot news analysis interface provided in an embodiment of the present application;

FIG. 12 is an influence factor analysis interface provided in an embodiment of the present application;

FIG. 13 is a price forecasting analysis interface provided in accordance with an embodiment of the present application;

FIG. 14 is a block diagram of an enterprise information data processing system according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a data processing terminal according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an enterprise information data processing method according to an embodiment of the present application, where the method 100 includes:

s101: acquiring webpage data updated by a target website within preset time, screening the acquired webpage data and storing the webpage data in a database;

s102: analyzing the data of the database according to a preset analysis strategy to obtain analysis data related to the target website;

s103: and displaying the analysis data in a webpage form.

It should be noted that the web page data includes text data and numerical data, and the enterprise information may include information in various aspects such as hot spot today, domestic headline, international headline, statistical monthly report, data index, etc.

Based on the foregoing embodiment, as a preferred embodiment, the acquiring, in step S101, web page data updated by a target website within a preset time, and performing a screening process on the acquired web page data and storing the web page data in a database includes:

Based on the foregoing embodiment, as a preferred embodiment, the collecting, in step S101, web page data and corresponding browsing volume updated by the target website within a preset time includes:

Specifically, web crawler data acquisition, database API interface website data acquisition, acquisition of mail server website data, acquisition of current updated text data and numerical data of a target website and corresponding browsing amount are respectively realized through an automatic test technology, an ODBC technology and an SMTP protocol, the obtained webpage data are repeatedly filtered and screened in the background, the value degree of the webpage data is screened according to whether the browsing amount of the webpage data reaches a preset numerical value (such as the browsing amount of 100 people/day), the screened valuable webpage data are classified according to topics, and the valuable webpage data are butted and stored in a MySQL database according to a standard format.

Based on the above embodiment, as a preferred embodiment, the step S101 further includes:

Specifically, through the butt joint of the external webpage and the database, the enterprise intranet users can browse valuable news and information under different themes, and can further generate abstract information from the acquired news and information to serve as primary analysis to assist scientific research personnel.

Based on the foregoing embodiment, as a preferred embodiment, the step S102 performs analysis processing on the data in the database according to a preset analysis policy to obtain analysis data related to the target website, and includes:

As shown in fig. 2, fig. 2 is a flowchart of a text abstract extracting method according to an embodiment of the present application, where the method includes:

extracting text data of the database to perform text clause division;

constructing a sentence directed graph;

calculating the weight of each clause according to the times of the clauses;

sorting the clauses according to the weight;

and sequentially outputting the text summaries according to the sentence sorting.

As shown in fig. 3, fig. 3 is a flowchart of a hot word analysis method provided in the embodiment of the present application, where the method includes:

extracting text data of the database to perform text clause segmentation, and performing text word segmentation on the text clause;

constructing a word directed graph;

calculating the weight of each word according to the occurrence frequency of the word;

sorting the words according to the weight;

and sequentially outputting hot words according to the word sequence.

It should be noted that text abstract extraction is similar to keyword extraction, except that abstract extraction uses sentences as nodes, keyword extraction uses words as nodes, and the task of keywords is to automatically extract a plurality of meaningful words or phrases from a given text. By adopting the method, hot words are extracted from the news collected every day, and the extracted words can basically show the hot spots of the relevant information on the day.

As shown in fig. 4, fig. 4 is a flowchart of a hot news analysis method provided in an embodiment of the present application, where the method includes:

extracting text data of the database for preprocessing;

constructing a corpus according to the text data, performing TF-IDF word segmentation processing on the text data of the corpus, and removing stop words;

adopting an LSI model to calculate to obtain more accurate text similarity of a calculation result;

and comprehensively analyzing according to the text similarity and the user attention to obtain hot news sequencing.

It should be noted that, for hot news analysis, a method combining text similarity calculation and user attention is used for measurement, where the text similarity is calculated to indicate how many similar news appear in a week period, and the more similar news appear indicates that the degree of hot news is greater in the week period. The user attention is the value of the click rate and the browsing volume of the user in the time period of one week, and the larger the value of the click rate and the browsing volume represents that the news is the hot news in the time period of the week to a greater extent.

When text similarity calculation is carried out, a plurality of keywords can be taken out from the news text respectively to generate word frequency vectors of two news articles, the similarity of the two articles is represented by calculating the cosine similarity of the two vectors, and the larger the value is, the more similar the two articles are; the implicit semantic indexing (LSI) converts a word frequency matrix into a singular matrix by using a Singular Value Decomposition (SVD) technology in a matrix theory: firstly, a document matrix is generated from all document sets, each component of the matrix is an integer value and represents the number of times that a specific document matrix appears in a specific document. Then, singular value decomposition is carried out on the matrix, and smaller singular values are removed. The resulting singular vectors and singular value matrices are used to map the document vectors and query vectors into a subspace in which the semantic relationships from the document matrices are preserved. And finally, calculating the cosine similarity of the included angles between the vectors through standardized inner product calculation, and further comparing the similarity between texts according to the calculation result.

When calculating the user attention, the click rate and the browsing amount of each piece of news are crawled to be used as an index of the user attention, the click rate or the browsing amount of some website news are not set as the index, the similarity between the unknown text and all texts in the database is calculated according to an LSI model, and the browsing amount and the click rate are predicted according to the sum of the products of the similarity and the respective browsing amounts.

As shown in fig. 5, fig. 5 is a flowchart of an influence factor analysis method provided in an embodiment of the present application, where the method includes:

for analysis of influence factors in texts, a cause-and-effect event graph is constructed for crawled news texts by a knowledge graph and fuzzy ontology method, and causes of specified events or results of the specified events are found by constructing the knowledge graph.

(1) And (5) constructing a cause and effect knowledge base. The construction of the causal knowledge base comprises a causal word library, a causal pattern library and the like. The natural language needs to be analyzed to a certain extent, and it is common to find a causal pattern in the natural language when a causal event can occur, for example, "scientific and technological progress enables rapid development of new energy," in this statement, "scientific and technological progress" is a reason, "rapid development of new energy" is a result, where "enable" is a causal connection. According to the Chinese expression mode, we adopt 9 causal patterns, which are shown in Table 1.

TABLE 1 causal patterns Table

(2) And preprocessing the text. This includes word segmentation of text, elimination of stop words, etc. And extracting the causal events, namely extracting the causal pairs based on the causal pattern library, and extracting the events at the corresponding positions of the causal connection words according to the causal connection words in the causal knowledge library established in the previous step to obtain the extraction of the causal pairs.

(3) For the representation of the event, the problem is the core problem of the construction of the whole cause-and-effect graph, because the event graph is connected in nature, how to select a proper way (phrase, short sentence, sentence stem) and the like is important, and the proper phrase or sentence is selected according to the part of speech after word segmentation and semantic analysis to obtain the representation of the event.

(4) And fusing the events. The same event may have different language representations, and we need to perform fusion of events through fuzzy ontology by using semantic similarity calculation, that is, to solve the problem of entity alignment in a general knowledge graph, and after the event is expressed, point the event a to the event B, point the event B to the event C, and indicate that the event a is the cause of the event B, and the event B is the cause of the event C. Through the construction of the steps, a knowledge graph constructed according to the input text is obtained, and the reason of the relevant event and the result caused by the relevant event can be found through traversing and searching the knowledge graph.

As shown in fig. 6, fig. 6 is a flowchart of a price prediction analysis method provided in an embodiment of the present application, where the method includes:

(1) the sequence was checked for smoothness. And smoothing the non-stationary sequence by adopting a difference or logarithm method.

(2) The order of the model is determined. And selecting the optimal model parameters according to the minimum information criterion of the AIC by a grid search method.

(3) Parameter estimation and diagnostic testing. Estimating model parameters, checking the significance of the parameters, and eliminating insignificant parameters. And carrying out white noise detection on the model residual error, wherein if the residual error is white noise, the model is reasonably established, otherwise, the model fitting is insufficient, and improvement is needed.

(4) And (5) model prediction. And (4) predicting by using the established ARIMA model.

It should be noted that there are two main methods for analyzing market price trends: basic analysis and technical analysis. When the price trend is predicted by using a basic analysis method, the analysis and research are focused on the market supply and demand relationship influencing the commodity price and the change of other general factors, the commodity price trend is predicted according to the supply and demand relationship of the commodity and various factors influencing the supply and demand relationship change, and if the supply is larger than the demand, the price is lowered; if the supply is short, the price will rise. The technical analysis rule is that a price trend chart is drawn according to the time sequence by analyzing the historical price form and the current market price dynamic state, and then the price trend is predicted according to the change of the chart. For price analysis and prediction, because the price trend in the time dimension is predicted and judged, a time series prediction model algorithm adopted by the user is adopted, meanwhile, for the prices of natural gas and crude oil, fluctuation up and down in a period of time still has a certain amplitude, and in consideration of the unsmooth of the price trend, an ARIMA (automatic Integrated Moving average model) model is adopted by the user.

The model is a famous time series prediction method proposed by Box and Jenkins in the early 70 s of the 20 th century, and is a common model aiming at non-stationary time series modeling. ARIMA (p, d, q) is called a sum-autoregressive moving average model where AR is the autoregressive term, p is the autoregressive order, MA is the moving average term, q is the moving average order, and d represents the number of differences that the non-stationary time series will undergo to become a stationary sequence. The method predicts the trend of the future price according to the trend of the historical price. Due to the characteristics of the model, the shorter the prediction time is, the higher the prediction accuracy is, and the price value of the next point in the future is predicted by the historical trend of the price.

Based on the above embodiment, as a preferred embodiment, the step S103 displays the analysis data in the form of a web page, including:

Specifically, as shown in fig. 7-13, after the text abstract extraction, the hot word analysis, the hot news analysis, and the price analysis and prediction function are realized through the above method, the beijing gas information technology information achievement webpage is displayed in a website form, and fig. 7-13 are a news display interface, a news search interface, a text abstract extraction interface, a hot word analysis interface, a hot news analysis interface, an influence factor analysis interface, and a price prediction analysis interface, respectively.

The text abstract extraction of text data and numerical data is carried out by using an information analysis method, the reading time of scientific research personnel can be reduced, the hot word analysis is carried out by using the information analysis method, the industry research hot spot can be mastered, the industry development hot spot can be known by using the information analysis method to carry out hot news analysis, the influence factor analysis by using the information analysis method can simply and clearly reveal the article connotation, and the price analysis and prediction by using the information analysis method can replace the scientific research personnel to carry out preliminary data analysis and processing. The system and the method have the advantages that the efficiency of scientific and technological information work of gas enterprises is greatly improved by organically combining data acquisition, text and data informatization analysis and page display technologies.

Fig. 14 is a schematic structural diagram of an enterprise information data processing system according to an embodiment of the present application, where the system 1400 includes:

an obtaining unit 1401 configured to obtain web page data updated by a target website within a preset time, perform screening processing on the obtained web page data, and store the obtained web page data in a database;

an analysis unit 1402 configured to analyze and process data of the database according to a preset analysis policy to obtain analysis data related to the target website;

a display unit 1403 configured to display the analysis data in a form of a web page.

Based on the above embodiment, as a preferred embodiment, the obtaining unit 1401 includes:

Based on the above embodiment, as a preferred embodiment, the collecting unit is specifically configured to:

Based on the above embodiment, as a preferred embodiment, the system 1400 further includes:

Based on the above embodiment, as a preferred embodiment, the analysis unit 1402 is specifically configured to:

Based on the foregoing embodiment, as a preferred embodiment, the display unit 1403 is specifically configured to:

Referring to fig. 15, fig. 15 is a schematic structural diagram of a terminal 1500 according to an embodiment of the present disclosure, where the terminal system 1500 may be used to execute the enterprise information data processing method according to the embodiment of the present disclosure.

The terminal system 1500 may include: a processor 1501, memory 1502, and a communication unit 1503. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.

The memory 1502 may be used for storing instructions executed by the processor 1501, and the memory 1502 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in the memory 1502, when executed by the processor 1501, enable the terminal 1500 to perform some or all of the steps in the method embodiments described below.

The processor 1501 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 1502 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 1501 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.

A communication unit 1503, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present application also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

The development degree of scientific and technological information work of gas enterprises is not high, and the traditional mode is difficult to be applied to the development of the information work of the modern information explosion society. The invention partially improves the efficiency of the scientific and technological information work of the gas enterprise by utilizing the organic combination of several informatization means, and is an innovation in the development of the scientific and technological information work of the gas enterprise. The organic combination of several informationized means also makes single informationized means more can optimize the scientific and technological information work of gas enterprise.

The invention is designed specifically for the scientific and technological information work of the gas enterprise, and has more representative value in the industry.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system provided by the embodiment, the description is relatively simple because the system corresponds to the method provided by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. An enterprise information data processing method is characterized by comprising the following steps:

and displaying the analysis data in a webpage form.

2. The method for processing enterprise information data according to claim 1, wherein the steps of obtaining web page data updated by a target website within a preset time, screening the obtained web page data, and storing the screened web page data in a database include:

3. The method of claim 2, wherein collecting updated web page data and corresponding browsing volume of a target website within a predetermined time comprises:

4. The method for processing enterprise information data according to claim 1, wherein the acquiring web page data updated by the target website within a preset time, and performing a screening process on the acquired web page data and storing the web page data in a database, further comprises:

5. The method as claimed in claim 1, wherein the analyzing the data in the database according to a predetermined analysis strategy to obtain the analysis data related to the target website comprises:

6. The method of claim 1, wherein the displaying the analysis data in the form of a web page comprises:

7. An enterprise information data processing system, comprising:

8. The system of claim 7, comprising:

the acquisition unit includes:

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-6.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.