CN117648920A - Method, device, computer equipment and storage medium for processing research report data - Google Patents

Method, device, computer equipment and storage medium for processing research report data Download PDF

Info

Publication number
CN117648920A
CN117648920A CN202311736057.3A CN202311736057A CN117648920A CN 117648920 A CN117648920 A CN 117648920A CN 202311736057 A CN202311736057 A CN 202311736057A CN 117648920 A CN117648920 A CN 117648920A
Authority
CN
China
Prior art keywords
report data
target report
target
data
analysis result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311736057.3A
Other languages
Chinese (zh)
Inventor
赵永会
吴佳杰
杨海力
邓力凡
殷莹
魏包桑
方媛
李诗凝
马子元
丁玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co ltd
Original Assignee
China Life Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co ltd filed Critical China Life Insurance Co ltd
Priority to CN202311736057.3A priority Critical patent/CN117648920A/en
Publication of CN117648920A publication Critical patent/CN117648920A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present application relates to a method, apparatus, computer device, storage medium and computer program product for processing data. The method comprises the following steps: acquiring target report data based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. The method can improve the processing efficiency and accuracy of the data.

Description

Method, device, computer equipment and storage medium for processing research report data
Technical Field
The present invention relates to the field of computer technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for processing data.
Background
Currently, in the field of report management, the key information including industry analysis, report abstracts and the like is formed by manually downloading the original text sent by a report supplier and then manually analyzing the report.
However, the existing method for processing the data of the research report has low efficiency in manual sorting, and the method for manually downloading the original text of the research report requires logging in mails every day, downloading attachments one by one and sorting the mails, and analyzing the research report so as to extract key information.
However, the current manual processing method results in low efficiency of processing the report data.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, a computer-readable storage medium, and a computer program product for processing report data that can improve efficiency.
In a first aspect, the present application provides a method for processing data, including:
acquiring target report data based on the timing task;
analyzing the target report data to obtain an analysis result of the target report data;
inquiring subscription information of target research report data;
and outputting the analysis result of the target report data according to the subscription information of the target report data.
In one embodiment, acquiring target report data based on a timed task includes:
acquiring initial mail attachment information based on a timing task;
preprocessing the initial mail attachment information to obtain target mail attachment information;
and analyzing the target mail attachment information to obtain target report data.
In one embodiment, the analyzing the target report data to obtain the analysis result of the target report data includes:
converting the target research report data into a picture to be identified;
analyzing the picture to be identified by an optical character identification technology to obtain an initial text;
and inputting the initial text into an analysis model to obtain an analysis result of the target report data output by the analysis model.
In one embodiment, analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text includes:
determining a corresponding report template according to the supplier identification of the target report data;
determining each position to be identified of the picture to be identified based on the report grinding template;
and recognizing the data of each position to be recognized by an optical character recognition technology to obtain an initial text corresponding to each position to be recognized.
In one embodiment, after the analyzing process is performed on the target report data to obtain the analysis result of the target report data, the method further includes:
storing the analysis result of the target report data to a message queue;
outputting the analysis result of the target report data according to the subscription information of the target report data, including:
and reading subscription information of the target report data, and outputting an analysis result of the target report data from the message queue based on the subscription information of the target report data.
In one embodiment, the analyzing process is performed on the target report data, and before the analysis result of the target report data is obtained, the method further includes:
storing target report data into a list;
the target report data in the list is adjusted in response to an operation instruction, the operation instruction including at least one of adding, deleting, modifying, and querying.
In a second aspect, the present application further provides an apparatus for processing data, including:
the acquisition module is used for acquiring target report data based on the timing task;
the analysis module is used for analyzing the target report data to obtain an analysis result of the target report data;
the inquiry module is used for inquiring subscription information of the target research report data;
and the output module is used for outputting the analysis result of the target report data according to the subscription information of the target report data.
In a third aspect, the present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described method.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
The method, the device, the computer equipment, the storage medium and the computer program product for processing the report data acquire target report data based on a timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. According to the method, analysis processing is carried out on the target report data acquired based on the timing task, and according to the subscription information of the target report data, the analysis result of the target report data is pushed to the equipment subscribed by the target report data, so that the problem of low efficiency of manually processing the target report data is solved, and the efficiency of processing the target report data can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a diagram of an application environment for a method of processing data according to one embodiment;
FIG. 2 is a flowchart of a method for processing data according to an embodiment;
FIG. 3 is a flow chart of acquiring target report data based on a timing task in one embodiment;
FIG. 4 is a flow chart illustrating the analysis of target report data according to one embodiment;
FIG. 5 is a flowchart of an embodiment of analyzing a picture to be recognized by an optical character recognition technique to obtain an initial text;
FIG. 6 is a flow chart illustrating a process of analyzing target report data according to another embodiment, before the analysis result of the target report data is obtained;
FIG. 7 is a flowchart of a method for processing data according to an embodiment;
FIG. 8 is a schematic diagram of an interaction flow of a mail acquisition and subscription distribution module in one embodiment;
FIG. 9 is a schematic diagram of an interaction flow of the research management module in one embodiment;
FIG. 10 is a schematic diagram of an interaction flow of the report data parsing module in one embodiment;
FIG. 11 is a block diagram illustrating an embodiment of a data processing apparatus;
fig. 12 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The method for processing the report data provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the user device 102 communicates with the server 104 via a network. Mail servers may be integrated on server 104 or may be located on the cloud or other network server. The server acquires target report data based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, a method for processing data is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps S202 to S208. Wherein:
step S202, acquiring target report data based on the timing task.
The research report data may be data in a portable file format (portable document format, abbreviated as PDF), or may be in an XML format or a WORD format.
Optionally, the server sets a timing acquisition task, such as a quantiz task scheduling framework, in the report data processing system, and acquires the target report data in PDF format from the mail system, the application program, the web page information, the applet, etc. at fixed intervals, such as every 24 hours, every week, etc., by timing the target report data task through the quantiz task scheduling framework.
Optionally, before acquiring the target report data based on the timing task, the server sets a quantiz timing task scheduling frame in the report data processing system in advance, and the server judges whether to initiate the acquisition task through the quantiz timing task scheduling frame, and when receiving the acquisition task instruction, that is, when the acquisition task is required to be initiated, acquires the target report data from the mail system, the application program, the webpage information, the applet and the like at fixed time intervals.
Optionally, the server may store the obtained target report data in a storage database through the report data processing system, generate list information, and perform maintenance management on the target report data through an instruction to perform modification operation, deletion operation, addition operation, query operation and the like on the target report data in the list.
Further, the server can upload the target report data to the database through the report data processing system through the interface, update the list information, realize the management of adding, deleting and checking the target report data, and process other related requests in the system, such as format conversion requests.
Step S204, analyzing the target report data to obtain an analysis result of the target report data.
The analysis result can be all information of analysis target report data; the information may be part of the target report data, that is, the key information of the target report data.
Optionally, the server analyzes the target report data through the report data processing system to obtain all information of the analyzed target report data; key information of the target report data, such as supplier information, analyst information, report summary information, etc., can also be obtained. The parsing process may include format conversion, character recognition, text combination, and other processing modes.
Step S206, inquiring subscription information of the target report data.
Optionally, before querying subscription information of the target report data, the user equipment subscribes the target report data through a report data processing system, and the report data processing system stores the subscription information of the user equipment into a subscription storage database; the server queries subscription information of the target report data through the report data processing system, wherein the subscription information can comprise user equipment information, user equipment subscription time, subscription period and the like for subscribing the target report data, and the server can issue the target report data to all user equipment subscribing the target report data.
Step S208, according to the subscription information of the target report data, the analysis result of the target report data is output.
Optionally, the server outputs the analysis result of the target report data to the user equipment subscribing to the target report data according to the subscription information of the target report data, so that the server pushes the target report data to the user equipment.
Optionally, the server pushes the target report data task at regular time through the quantiz task scheduling framework according to the regular push task and the subscription information of the target report data, and when the quantiz regular task scheduling framework needs to initiate the subscription push task, the analysis result is pushed to the subscription information of the target report data every fixed time, for example, 8 points in the morning every day.
Optionally, outputting the analysis result of the target report data according to the subscription information of the target report data. Before, the server may store the analysis result in the appointed database and generate a list, and the server may perform maintenance operations such as adding, deleting, modifying and checking the analysis result of the target report data according to the obtained instruction, and update the list information according to the instruction.
In the embodiment of the present application, the time of the timing task, the content of the subscription information, and the analysis result of the target report data are not limited.
In the method for processing the report data, the target report data is acquired based on the timing task; analyzing the target report data to obtain an analysis result of the target report data; inquiring subscription information of target research report data; and outputting the analysis result of the target report data according to the subscription information of the target report data. In this way, analysis processing is performed on target report data acquired based on the timing task; and according to the subscription information of the target report data, the analysis result of the target report data is pushed to the equipment subscribed by the target report data, so that the problem of low efficiency of manually processing the target report data is solved, and the efficiency of processing the target report data can be improved.
In an exemplary embodiment, as shown in fig. 3, the target report data is acquired based on the timing task, including steps S302 to S306. Wherein:
step S302, acquiring initial mail attachment information based on the timing task.
Optionally, the server obtains initial mail attachment information from the mail system at each fixed time according to the Quartz timing task scheduling frame, the mailbox account number and the authorization code, wherein the initial mail attachment information can be obtained by a single thread or multiple threads.
Further, the server may set conditions for the timing acquisition task, for example, acquire mail sent as a certain provider, may set a plurality of mail, or may set one mail; the mail attachment of other suppliers can be obtained by reverse setting and selecting other suppliers.
Step S304, preprocessing the initial mail attachment information to obtain target mail attachment information.
Optionally, the server pre-processes the acquired initial mail attachment information through the research and report data processing system, wherein the pre-process comprises removing webpage labels and illegal symbols from the mail, screening the mail containing Chinese text, performing sentence segmentation, word segmentation, part-of-speech labeling and dependency syntax analysis on the mail content, and storing the extracted mail result in a database.
Step S306, analyzing the target mail attachment information to obtain target report data.
Optionally, the server performs multi-thread analysis processing on the target mail attachment information obtained by preprocessing, and the analysis is mainly performed on the target mail attachment information, for example, the mail attachment information formed by a hypertext markup language in a mail system is analyzed into target research data in a PDF format through a hypertext transfer protocol.
In this embodiment, the target mail attachment is obtained by preprocessing the initial mail attachment, and then the target mail attachment is analyzed to obtain the target report data, so that the processing amount of the report data can be reduced, and illegal report data can be screened out in advance, thereby improving the processing efficiency of the report data. Meanwhile, the target mail attachment information is analyzed by multiple threads, and the processing efficiency of the data is improved.
In an exemplary embodiment, as shown in fig. 4, the analysis processing is performed on the target report data to obtain the analysis result of the target report data, which includes steps S402 to S406. Wherein:
step S402, converting the target report data into a picture to be identified.
Optionally, the server converts the target report data, e.g., in PDF format, into a picture format. And converting the target research data document into a picture to be identified.
Step S404, analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text.
Optical character recognition (Optical Character Recognition, abbreviated OCR) refers to extracting text information in an image, and generally includes text detection and text recognition.
Optionally, the server detects and identifies the text in the picture to be identified by the OCR technology, and extracts the text information detected and identified, thereby obtaining the initial text of the target report data, the initial textThe method can comprise various character strings of the research report, or can comprise a character string appointed in target research report data, wherein the appointed character string can comprise a character string at an appointed position, for example, the appointed position is a character string taking the lower left corner of a page as an origin of coordinates, the bottom of the page as a transverse coordinate axis, the left side of the page as a longitudinal coordinate axis, the distance of the coordinate axes is in millimeters, and the upper left coordinate (X) 1 ,Y 1 ) Upper right coordinates (X) 2 ,Y 2 ) Lower right coordinates (X) 3 ,Y 3 ) Lower left coordinates (X) 4 ,Y 4 ) The determined area is taken as a designated area; a string of specified characters, such as analysts, report titles, industry reviews, industry performances, risk cues, view summaries, etc., may also be included.
Step S406, inputting the initial text into the analysis model to obtain the analysis result of the target report data output by the analysis model.
Optionally, the server inputs the initial text into an analytical model, wherein the analytical model is a model trained in advance by adopting a machine learning model, and can be a neural network model, a support vector machine model and the like; the analysis model can carry out semantic analysis on the input initial text, and output analysis results according to the semantic analysis results, wherein the analysis results can comprise contents such as analysts, report titles, industry critique, industry performance, risk prompt, view overview and the like of target report data.
Further, the server can export a plurality of records to the table file at regular time, the analysis result of the same target report data is one record, and the server can display a plurality of records of the table file.
In the embodiment, through conversion, identification and model analysis of the target report data, subjectivity and personal prejudice existing in manual analysis of the target report data can be avoided, and objectivity and accuracy of analysis results of the target report data are improved.
In an exemplary embodiment, as shown in fig. 5, the picture to be recognized is parsed by the optical character recognition technology to obtain the initial text, which includes steps S502 to S506. Wherein:
step S502, determining a corresponding report template according to the supplier identification of the target report data.
Optionally, each provider of the target report data has a respective report template, where the report template includes a location of each content, for example, a location of an "analyst" text, a location of a "report title" text, a location of a "view summary" text, and so on. The server stores the supplier identification of each target report data and the corresponding report template into the database, and the server determines the corresponding report template from the database according to the supplier identification of the current target report data.
Step S504, determining each position to be identified of the picture to be identified based on the report template.
Optionally, the server determines each position to be identified of the picture to be identified based on the position information of the report template. Such as the location of the "analyst" text, the location of the "report heading" text, the location of the "opinion summary" text, etc.
Step S506, the data of each position to be identified is identified through an optical character identification technology, and an initial text corresponding to each position to be identified is obtained.
Optionally, the server recognizes the data of each position to be recognized through an optical character recognition technology OCR to obtain an initial text corresponding to each position to be recognized.
In this embodiment, by setting the newspaper grinding templates of different supplier identifiers, each position to be identified of the picture to be identified is determined, so that the position to be identified in the picture can be quickly identified by the newspaper grinding templates of different supplier identifiers, the identification work on unnecessary areas is reduced, the identification efficiency is improved, the position in the picture to be identified can be determined by the position of the newspaper grinding template, and the identification accuracy is also improved.
In an exemplary embodiment, after performing the parsing process on the target report data to obtain the parsing result of the target report data, the method further includes: storing the analysis result of the target report data to a message queue; outputting the analysis result of the target report data according to the subscription information of the target report data, including: and reading subscription information of the target report data, and outputting an analysis result of the target report data from the message queue based on the subscription information of the target report data.
Optionally, after analyzing the target report data to obtain an analysis result of the target report data, the server stores the analysis result of the target report data into a message queue, generates an analysis result list according to the analysis result of the target report data stored in the message queue, and performs adding, deleting and checking operations on the analysis result in the analysis result list according to the instruction; the server reads the subscription information of the target report data, judges whether to initiate a subscription pushing task through a Quartz timing task scheduling framework, pushes the analysis result to the subscription information of the target report data every fixed time, such as 8 points in the morning every day when the subscription pushing task is required to be initiated, and outputs the analysis result of the target report data to the user equipment subscribing the target report data from the message queue according to the subscription information of the target report data so as to push the target report data to the user equipment.
In this embodiment, by storing the analysis result of the target report data in the message queue and outputting the analysis result of the target report data from the message queue according to the subscription information of the target report data, the pushing of the target report data can be dispersed to a period of time for processing, so that the system breakdown is avoided, and meanwhile, only the target report data is pushed to the user equipment subscribing the subject, so that the cost of server resources is reduced.
In an exemplary embodiment, as shown in fig. 6, before the analysis processing is performed on the target report data to obtain the analysis result of the target report data, steps S602 to S604 are included. Wherein:
step S602, store the target report data into the list.
Optionally, the server stores the target report data in a database, and generates a list according to the target report data stored in the database. The server may also present the list information.
In step S604, the target report data in the list is adjusted in response to an operation instruction, where the operation instruction includes at least one of adding, deleting, modifying, and querying.
Optionally, the server receives operation instructions such as adding, deleting, modifying, querying, and the like, and the server can also perform modification operation, deleting operation, adding operation, querying operation, and the like on the target report data in the list through the instructions, so as to maintain and manage the target report data.
Further, the server can upload the target report data to the database through the interface and update the list information, and can also realize the management of adding, deleting and checking the target report data, and can also process related requests sent by other execution bodies, such as format conversion requests.
In this embodiment, the target report data in the list is adjusted by the operation instruction, so that normalized management of the target report data can be realized, and the historical document data can be efficiently managed.
In one exemplary embodiment, the servers include, in particular, a lapping server, a parsing source server; the lapping server can also comprise a mail dispatching and distributing server and a lapping management center; the parsing source server may also include a research report parsing platform, an OCR server, and a parsing model server.
As shown in fig. 7, the present embodiment relates to interactions among user devices, mailbox servers and servers, where the servers include a research server and a parsing source server. In particular, the embodiment relates to three modules, namely a mail acquisition and subscription distribution module, a research management module and a research data analysis module.
The mail sorting server periodically captures and sorts mails from the mail server, and the mail sorting server processes the mails and the attachment grinding report; the research server sends a research data processing request to the analysis source server; and the analysis source server returns an analysis result of the report data. The user equipment inquires the report data list and the subscription information, the report server processes the analysis result and the subscription information, and pushes the analysis result of the report data of different categories to the mailbox server, and the mailbox server sends the analysis result of the report data of different categories to the user equipment subscribing the report data.
As shown in fig. 8, the acquiring and subscribing distributing module relates to user equipment, a mailbox server and a server, wherein the server comprises a mail scheduling and distributing server and a research management center.
The scheduling and distributing server determines whether to initiate an acquisition task through a Quartz task scheduling frame, and acquires initial mail attachment information from a mail server when the acquisition task needs to be initiated; the mail server returns a mail body and initial mail attachment information, the dispatch and distribution server carries out pretreatment on the initial mail attachment information, the pretreatment comprises removing webpage labels and illegal symbols from the mail, screening out the mail containing Chinese text, carrying out sentence segmentation, word segmentation, part-of-speech labeling and dependency syntax analysis on the mail content, and storing the extracted mail result in a database to obtain legal target mail attachment information; the dispatching and distributing server analyzes the target mail attachment information to obtain the research report data. And the scheduling and distributing server sends the PDF format report data to a report management center. The user equipment sends industry subscription information to the mail scheduling and distributing server, the mail scheduling and distributing server judges whether to initiate a subscription pushing task at regular time through the Quartz task scheduling framework, when the subscription pushing task needs to be initiated, the mail scheduling and distributing server sends the subscription information to the polishing management center, the polishing management center returns analysis results of the polishing data to the mail scheduling and distributing server, the mail scheduling and distributing server sends analysis results of the polishing data to the mail server, and the mail server sends analysis results of the polishing data to the user equipment receiving the subscription information.
As shown in fig. 9, the lapping management module relates to a user device and a server, wherein the server comprises a mail scheduling and distributing server, a lapping management center and a lapping analysis platform.
The mail dispatching and distributing server uploads the PDF format report data to the report management center; the polishing management center maintains a polishing data task and sends polishing data in a PDF format to a polishing analysis platform; the report analysis platform returns the analysis result to the report management center, the report management center processes the report data information list, and the user equipment can perform daily maintenance of the report data, such as deletion operation and modification operation, in the report management center by sending an instruction. The management center pushes the analysis result to the mail dispatch and distribution server.
As shown in fig. 10, the report analysis platform relates to a server, wherein the server comprises a report management center, a report analysis platform, an OCR server and an analysis model server.
The management center sends the report data in PDF format to the report analysis platform, the report analysis platform receives the report data in PDF format, converts the report data in PDF format into picture format, sends the picture to the OCR server for OCR processing, and returns OCR identification information; the report analysis platform inputs the returned OCR recognition information to the analysis model server, the analysis model server returns the text abstract information to the report analysis platform, and the report analysis platform informs the analysis result to the report management center.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a research report data processing device for realizing the above related research report data processing method. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitation of one or more embodiments of the report data processing device provided below may refer to the limitation of the report data processing method described above, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 11, there is provided a research report data processing apparatus, including: an acquisition module 1101, a parsing module 1102, a query module 1103 and an output module 1104, wherein:
the obtaining module 1101 is configured to obtain target report data based on the timing task.
And the analysis module 1102 is used for analyzing the target report data to obtain an analysis result of the target report data.
The query module 1103 is configured to query subscription information of the target report data.
And the output module 1104 is configured to output an analysis result of the target report data according to the subscription information of the target report data.
In one exemplary embodiment, the obtaining module 1101 includes:
and the acquisition unit is used for acquiring the initial mail attachment information based on the timing task.
And the preprocessing unit is used for preprocessing the initial mail attachment information to obtain target mail attachment information.
And the analysis unit is used for analyzing the target mail attachment information to obtain target research data.
In one exemplary embodiment, parsing module 1102 includes:
and the conversion unit is used for converting the target research report data into the picture to be identified.
And the initial text determining unit is used for analyzing the picture to be recognized through an optical character recognition technology to obtain an initial text.
The result acquisition unit is used for inputting the initial text into the analysis model and acquiring the analysis result of the target report data output by the analysis model.
In an exemplary embodiment, the initial text determining unit further includes:
and the report template determining subunit is used for determining the corresponding report template according to the supplier identification of the target report data.
And the position determining subunit is used for determining each position to be identified of the picture to be identified based on the report grinding template.
And the initial text determining subunit is used for identifying the data of each position to be identified through an optical character identification technology to obtain an initial text corresponding to each position to be identified.
In an exemplary embodiment, an apparatus for processing data for research, further includes:
and the first storage module is used for storing the analysis result of the target report data to the message queue.
An output module 1104 comprising:
the reading output unit is used for reading the subscription information of the target report data and outputting the analysis result of the target report data from the message queue based on the subscription information of the target report data.
In an exemplary embodiment, an apparatus for processing data for research, further includes:
and the second storage module is used for storing the target report data into the list.
And the adjusting module is used for responding to an operation instruction, and adjusting the target report data in the list, wherein the operation instruction comprises at least one of adding, deleting, modifying and inquiring.
The modules in the above-mentioned research data processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing the report data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for processing data.
It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method for processing data, the method comprising:
acquiring target report data based on the timing task;
analyzing the target report data to obtain an analysis result of the target report data;
inquiring subscription information of the target report data;
and outputting the analysis result of the target report data according to the subscription information of the target report data.
2. The method of claim 1, wherein the acquiring target report data based on the timed task comprises:
acquiring initial mail attachment information based on the timing task;
preprocessing the initial mail attachment information to obtain target mail attachment information;
and analyzing the target mail attachment information to obtain the target report data.
3. The method of claim 1, wherein the parsing the target report data to obtain a parsing result of the target report data comprises:
converting the target research data into a picture to be identified;
analyzing the picture to be recognized by an optical character recognition technology to obtain an initial text;
and inputting the initial text into an analysis model to obtain an analysis result of the target report data output by the analysis model.
4. A method according to claim 3, wherein said parsing the picture to be recognized by optical character recognition technology to obtain an initial text comprises:
determining a corresponding report template according to the supplier identification of the target report data;
determining each position to be identified of the picture to be identified based on the report template;
and recognizing the data of each position to be recognized by the optical character recognition technology to obtain an initial text corresponding to each position to be recognized.
5. The method according to claim 1, wherein after the parsing of the target report data, the method further comprises:
storing the analysis result of the target research data to a message queue;
the outputting the analysis result of the target report data according to the subscription information of the target report data comprises the following steps:
reading the subscription information of the target report data, and outputting the analysis result of the target report data from the message queue based on the subscription information of the target report data.
6. The method according to any one of claims 1 to 5, wherein before the analyzing the target report data to obtain the analysis result of the target report data, the method further comprises:
storing the target report data into a list;
the target report data in the list is adjusted in response to an operational instruction, the operational instruction including at least one of adding, deleting, modifying, and querying.
7. An abrasive data processing device, the device comprising:
the acquisition module is used for acquiring target report data based on the timing task;
the analysis module is used for carrying out analysis processing on the target report data to obtain an analysis result of the target report data;
the query module is used for querying subscription information of the target research report data;
and the output module is used for outputting the analysis result of the target report data according to the subscription information of the target report data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202311736057.3A 2023-12-18 2023-12-18 Method, device, computer equipment and storage medium for processing research report data Pending CN117648920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311736057.3A CN117648920A (en) 2023-12-18 2023-12-18 Method, device, computer equipment and storage medium for processing research report data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311736057.3A CN117648920A (en) 2023-12-18 2023-12-18 Method, device, computer equipment and storage medium for processing research report data

Publications (1)

Publication Number Publication Date
CN117648920A true CN117648920A (en) 2024-03-05

Family

ID=90049441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311736057.3A Pending CN117648920A (en) 2023-12-18 2023-12-18 Method, device, computer equipment and storage medium for processing research report data

Country Status (1)

Country Link
CN (1) CN117648920A (en)

Similar Documents

Publication Publication Date Title
CN108932294B (en) Resume data processing method, device, equipment and storage medium based on index
CN110795919B (en) Form extraction method, device, equipment and medium in PDF document
US20100318492A1 (en) Data analysis system and method
WO2019196226A1 (en) System information querying method and apparatus, computer device, and storage medium
WO2021248492A1 (en) Semantic representation of text in document
US20210406981A1 (en) Method and apparatus of determining display page, electronic device, and medium
US20150278248A1 (en) Personal Information Management Service System
CN102880683A (en) Automatic network generation system for feasibility study report and generation method thereof
CN111191111A (en) Content recommendation method, device and storage medium
CN111078980A (en) Management method, device, equipment and storage medium based on credit investigation big data
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN117095419A (en) PDF document data processing and information extracting device and method
CN111026972A (en) Subscription data pushing method, device, equipment and storage medium in Internet of things
CN117648920A (en) Method, device, computer equipment and storage medium for processing research report data
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN116166858A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN115204393A (en) Smart city knowledge ontology base construction method and device based on knowledge graph
CN114925125A (en) Data processing method, device and system, electronic equipment and storage medium
CN113821555A (en) Unstructured data collection processing method of intelligent supervision black box
CN114610769A (en) Data analysis method, device, equipment and storage medium
CN114115831A (en) Data processing method, device, equipment and storage medium
KR20220079029A (en) Method for providing automatic document-based multimedia content creation service
KR20220079026A (en) A apparatus for providing general document-based multimedia image content production service
CN112069807A (en) Text data theme extraction method and device, computer equipment and storage medium
CN113536788B (en) Information processing method, device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination