CN111666383A - Information processing method, information processing device, electronic equipment and computer readable storage medium - Google Patents

Information processing method, information processing device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111666383A
CN111666383A CN202010622216.7A CN202010622216A CN111666383A CN 111666383 A CN111666383 A CN 111666383A CN 202010622216 A CN202010622216 A CN 202010622216A CN 111666383 A CN111666383 A CN 111666383A
Authority
CN
China
Prior art keywords
effective information
image
report
report file
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010622216.7A
Other languages
Chinese (zh)
Inventor
夏梦
曹毅
王冬冬
牛晓川
范俊豪
邹嘉伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010622216.7A priority Critical patent/CN111666383A/en
Publication of CN111666383A publication Critical patent/CN111666383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an information processing method, an information processing device, electronic equipment and a computer readable storage medium, and relates to the field of information processing. The method comprises the following steps: aiming at the search keyword, searching and obtaining at least one corresponding effective information group based on a preset search engine; determining the report file to which each effective information group belongs, and acquiring the report file information of each report file; aggregating all the effective information groups based on the affiliated report files to obtain the aggregated report files and the effective information corresponding to each report file; generating a content box aiming at each report file to obtain at least one content box; the content box comprises report file information of a report file and corresponding effective information; and displaying each content box. According to the method and the device, the hit rate of the search keywords is improved, the discrimination behaviors of the user during browsing are reduced, and therefore the user experience is improved.

Description

Information processing method, information processing device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to an information processing method, an information processing apparatus, an electronic device, and a computer-readable storage medium.
Background
The industry report refers to business information, is competitive information, has strong timeliness, and generally is research analysis and prediction of the current industry and market through some latest statistical data and research data of national government institutions and professional market-regulation organizations, according to professional research models and specific analysis methods of cooperative institutions, and through analysis and research of industry-qualified persons.
In the prior art, in searching an industry report, a chart title in the report is hit through a keyword input by a user, related visual chart content in the report is extracted, and the visualized chart content is displayed in a search result page in a waterfall flow mode, wherein the display result is shown in fig. 1.
However, this search method has the following disadvantages:
1) through the title of the visual chart in the keyword hit report, the requirement on the structural standard degree of the report content is higher, a better application effect can be achieved in the dealer type report with a simpler content structure, but the problem of lower hit rate can occur in the organization type report and other types of reports with diversified content formats and high complexity;
2) the content matched with the search keyword is displayed in a waterfall flow mode, different contents are mutually independent in a search result page, when the matched contents are disordered in sequence, a user is required to screen the contents, and the user experience is poor.
Disclosure of Invention
The application provides an information processing method, an information processing device, electronic equipment and a computer readable storage medium, which can solve the problems of low hit rate and user discrimination requirement in searching industry reports. The technical scheme is as follows:
in a first aspect, an information processing method is provided, and the method includes:
aiming at a search keyword, searching to obtain at least one effective information group corresponding to the search keyword;
determining the report file to which each effective information group belongs, and acquiring the report file information of each report file;
aggregating all the effective information groups based on the affiliated report files to obtain aggregated effective information groups corresponding to each report file;
generating a content box aiming at each report file to obtain at least one content box; the content box comprises the report file information of a report file and a corresponding valid information group;
respectively displaying the at least one content box.
Preferably, any one of the at least one valid information group includes a valid information image, a valid information header, and a valid information keyword;
the method further comprises the following steps:
when a display instruction for any one of the at least one content box is received, obtaining effective information titles in each effective information group corresponding to the any content box;
and displaying each effective information title and an effective information group corresponding to the currently selected effective information title in each effective information title through a preset report content reader.
Preferably, the report content reader is further provided with at least one interactive instruction aiming at the currently presented effective information group;
the method further comprises the following steps:
and when any interactive instruction in the at least one content box is triggered, executing the interactive action corresponding to the interactive instruction aiming at the currently displayed effective information group.
Preferably, the interactive instructions comprise excerpt instructions;
when any interactive instruction is triggered, the interactive action corresponding to the interactive instruction is executed aiming at the currently displayed effective information group, and the method comprises the following steps:
when the excerpt instruction is triggered, judging whether a generated notebook exists in a preset favorite;
if yes, displaying a notebook list of the generated notebook, and copying the currently displayed effective information group to the notebook when a confirmation instruction for any notebook in the notebook list is received;
and if not, displaying a preset notebook creating interface, creating a new notebook based on the notebook creating interface, and copying the currently displayed effective information group to the new notebook.
Preferably, the method further comprises the following steps:
and when a display instruction for any one of the generated notebooks in the preset favorites is received, displaying the effective information group in the notebooks through a report content reader.
Preferably, the searching obtains at least one effective information group corresponding to the search keyword, and includes:
performing Query analysis on the search keywords to obtain analyzed keywords;
assembling the analyzed keywords based on an Elasticissearch DSL grammar to obtain Query sentences of effective information groups; the query statement comprises a keyword field and a title field;
and querying by adopting the query statement and an index in a preset search engine to obtain at least one effective information group matched with the search keyword.
Preferably, the preset search engine is generated by:
when detecting that any one of the at least one valid information group stored in the preset valid information database is updated, acquiring a valid information header and a valid information keyword of the valid information group with the updated data; the data update comprises at least one of addition, deletion and modification of the effective information group;
generating an index based on the effective information title and the effective information keyword, and establishing a mapping relation between the effective information title, the effective information keyword and the index; wherein the index includes a title field and a key field.
Preferably, the preset effective information database is generated by the following method:
acquiring a report file;
performing document image cutting processing on the report file according to the page number to obtain at least one report file image;
carrying out block recognition on each report file image to obtain at least one block corresponding to each report file image;
taking the report file image of which at least one block meets the preset requirement in each report file image as an effective information image to obtain at least one effective information image;
extracting an effective information title and an effective information keyword of each effective information image, and establishing an association relation between each effective information image and the effective information title and the effective information keyword which respectively correspond to each effective information image;
and storing each effective information image, an effective information title and an effective information keyword which are respectively corresponding to each effective information image and an association relation into the effective information database.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the number of digital blocks in each report file image exceeds a first number threshold;
and if so, taking the report file images exceeding the first quantity threshold value in each report file image as effective information images to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds a ratio threshold;
and if so, taking the report file image exceeding the proportion threshold value in each report file image as an effective information image to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
acquiring the height of at least one character block in each report file image, and determining a preset number of target character blocks with the maximum height;
detecting whether a target block in each report file image contains a Chinese block or not;
if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold;
if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image to obtain at least one effective information image.
In a second aspect, there is provided an information processing apparatus comprising:
the search module is used for searching for at least one effective information group corresponding to the search keyword according to the search keyword;
the processing module is used for determining the report file to which each effective information group belongs and acquiring the report file information of each report file;
the aggregation module is used for aggregating all the effective information groups based on the belonged report files to obtain aggregated effective information groups corresponding to each report file;
the generating module is used for generating a content box aiming at each report file to obtain at least one content box; the content box comprises the report file information of a report file and a corresponding valid information group;
and the display module is used for respectively displaying the at least one content box.
Preferably, any one of the at least one valid information group includes a valid information image, a valid information header, and a valid information keyword;
the device further comprises:
the receiving module is used for receiving a display instruction aiming at any one content box in the at least one content box;
the acquisition module is used for acquiring effective information titles in each effective information group corresponding to any content box;
the display module is further used for displaying each effective information title and an effective information group corresponding to the currently selected effective information title in each effective information title through a preset report content reader.
Preferably, the report content reader is further provided with at least one interactive instruction aiming at the currently presented effective information group;
the device further comprises:
and the execution module is used for executing the interactive action corresponding to the interactive instruction aiming at the currently displayed effective information group when any interactive instruction in at least one interactive instruction is triggered.
Preferably, the interactive instructions comprise excerpt instructions;
the execution module is specifically configured to:
when the excerpt instruction is triggered, judging whether a generated notebook exists in a preset favorite;
if yes, displaying a notebook list of the generated notebook, and copying the currently displayed effective information group to the notebook when a confirmation instruction for any notebook in the notebook list is received;
and if not, displaying a preset notebook creating interface, creating a new notebook based on the notebook creating interface, and copying the currently displayed effective information group to the new notebook.
Preferably, the receiving module is further configured to receive a display instruction for any one of the generated notebooks in the preset favorites;
the display module is also used for displaying the effective information group in the notebook through a report content reader.
Preferably, the search module includes:
the analysis submodule is used for carrying out Query analysis on the search keywords to obtain analyzed keywords;
the sentence assembling submodule is used for assembling the analyzed keywords based on the Elasticissearch Query DSL grammar to obtain the Query sentences of the effective information groups; the query statement comprises a keyword field and a title field;
and the query submodule is used for querying by adopting the query statement and an index in a preset search engine to obtain at least one effective information group matched with the search keyword.
Preferably, the preset search engine is generated by:
when detecting that any one of the at least one valid information group stored in the preset valid information database is updated, acquiring a valid information header and a valid information keyword of the valid information group with the updated data; the data update comprises at least one of addition, deletion and modification of the effective information group;
generating an index based on the effective information title and the effective information keyword, and establishing a mapping relation between the effective information title, the effective information keyword and the index; wherein the index includes a title field and a key field.
Preferably, the preset effective information database is generated by the following method:
acquiring a report file;
performing document image cutting processing on the report file according to the page number to obtain at least one report file image;
carrying out block recognition on each report file image to obtain at least one block corresponding to each report file image;
taking the report file image of which at least one block meets the preset requirement in each report file image as an effective information image to obtain at least one effective information image;
extracting an effective information title and an effective information keyword of each effective information image, and establishing an association relation between each effective information image and the effective information title and the effective information keyword which are respectively corresponding to each effective information image;
and storing each effective information image, an effective information title and an effective information keyword which are respectively corresponding to each effective information image and an association relation into the effective information database.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the number of digital blocks in each report file image exceeds a first number threshold;
and if so, taking the report file images exceeding the first quantity threshold value in each report file image as effective information images to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds a ratio threshold;
and if so, taking the report file image exceeding the proportion threshold value in each report file image as an effective information image to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
acquiring the height of at least one character block in each report file image, and determining a preset number of target character blocks with the maximum height;
detecting whether a target block in each report file image contains a Chinese block or not;
if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold;
if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image to obtain at least one effective information image.
In a third aspect, an electronic device is provided, which includes:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to call the operation instruction, and the executable instruction enables the processor to execute an operation corresponding to the information processing method shown in the first aspect of the present application.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the information processing method shown in the first aspect of the present application.
The beneficial effect that technical scheme that this application provided brought is:
aiming at a search keyword, searching to obtain at least one effective information group corresponding to the search keyword, then determining a report file to which each effective information group belongs, obtaining report file information of each report file, aggregating each effective information group based on the affiliated report file to obtain an aggregated effective information group corresponding to each report file, and generating a content box aiming at each report file to obtain at least one content box; the content box comprises the report file information of a report file and a corresponding valid information group; respectively displaying the at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a schematic diagram of a search results page for a search industry report in the prior art;
fig. 2 is a schematic flowchart of an information processing method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of an information processing method according to another embodiment of the present application;
FIG. 4 is a schematic view of an interface of a content cartridge of the present application;
FIG. 5 is a schematic view of a search results page for searching industry reports in the present application;
FIGS. 6A-6B are first and second schematic views of an interface of a report content reader of the present application;
FIG. 7 is a third interface schematic of a report reader of the present application;
FIGS. 8A-8B are schematic diagrams illustrating the effect of selecting a notebook for extraction according to the present application;
FIG. 9 is a schematic diagram illustrating the effect of extracting a newly-built notebook in the present application;
FIG. 10 is a flow chart of an excerpt from the present application;
FIG. 11 is a schematic view of an interface for favorites in the present application;
FIG. 12 is a schematic view of an interface for browsing snippets using a report content viewer according to the present application;
FIG. 13 is a schematic view of a search process based on search keywords in the present application;
FIG. 14 is a schematic diagram of the data processing of the ES search engine of the present application;
FIG. 15 is a schematic representation of the effectiveness of OCR in the present application;
FIG. 16 is a schematic diagram of a process for extracting effective information images according to the present application;
fig. 17 is a schematic structural diagram of an information processing apparatus according to yet another embodiment of the present application;
fig. 18 is a schematic structural diagram of an information processing electronic device according to yet another embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Database (Database), which can be regarded as an electronic file cabinet in short, a place for storing electronic files, a user can add, query, update, delete, etc. to data in files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.
A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions such as storage, interception, security assurance, and backup. The database management system may be categorized according to the database model it supports, such as relational, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or sorted according to the Query Language used, such as SQL (Structured Query Language), XQuery, or sorted according to performance impulse emphasis, such as max size, maximum operating speed, or other sorting.
A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
In the present application, an information processing method may be executed in a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Further, the user can interact with the server through the terminal so as to realize the service request. The terminal may have the following features:
(1) on a hardware architecture, a device has a central processing unit, a memory, an input unit and an output unit, that is, the device is often a microcomputer device having a communication function. In addition, various input modes such as a keyboard, a mouse, a touch screen, a microphone, a camera and the like can be provided, and input can be adjusted as required. Meanwhile, the equipment often has a plurality of output modes, such as a telephone receiver, a display screen and the like, and can be adjusted according to needs;
(2) on a software system, the device must have an operating system, such as Windows Mobile, Symbian, Palm, Android, iOS, and the like. Meanwhile, the operating systems are more and more open, and personalized application programs developed based on the open operating system platforms are infinite, such as a communication book, a schedule, a notebook, a calculator, various games and the like, so that the requirements of personalized users are met to a great extent;
(3) in terms of communication capacity, the device has flexible access mode and high-bandwidth communication performance, and can automatically adjust the selected communication mode according to the selected service and the environment, thereby being convenient for users to use. The device can support GSM (Global System for Mobile Communication), WCDMA (Wideband Code Division Multiple Access), CDMA2000(Code Division Multiple Access), TDSCDMA (Time Division-Synchronous Code Division Multiple Access), Wi-Fi (Wireless-Fidelity), WiMAX (world interoperability for Microwave Access), etc., thereby adapting to various systems of networks, not only supporting voice service, but also supporting various Wireless data services;
(4) in the aspect of function use, the equipment focuses more on humanization, individuation and multi-functionalization. With the development of computer technology, devices enter a human-centered mode from a device-centered mode, and the embedded computing, control technology, artificial intelligence technology, biometric authentication technology and the like are integrated, so that the human-oriented purpose is fully embodied. Due to the development of software technology, the equipment can be adjusted and set according to individual requirements, and is more personalized. Meanwhile, the device integrates a plurality of software and hardware, and the function is more and more powerful.
The application provides an information processing method, an information processing device, an electronic device and a computer readable storage medium, which aim to solve the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
In one embodiment, an information processing method is provided, as shown in fig. 2, the method including:
step S201, aiming at a search keyword, searching to obtain at least one effective information group corresponding to the search keyword;
in the embodiment of the invention, the terminal can be provided with an application program for browsing reports, the application program can comprise a search interface, a search bar can be arranged in the search interface, a user can input search keywords in the search bar and search, and the application program obtains corresponding search results through searching and displays the search results to the user. In the embodiment of the present invention, the report may be any file containing related information, for example, an industry report, and the industry report may be an industry analysis report, an industry research report, an industry data report, and the like, for example, an industry report in OTA industry-national security dealer report is an industry report.
Further, the search result may be at least one valid information group corresponding to the search keyword. The valid information group may include at least one valid information, and the valid information may correspond to the search keyword. In some embodiments, valid information refers to the content of the report that matches the user's search intent in an industry report and that can directly provide high value reference information for the user's research efforts. According to the report writing standards and habits of various organizations and teams in the current market, high-value information in the report generally appears in the chart content of the report.
The valid information includes, but is not limited to, a valid information image, a valid information header, and a valid information keyword in the report. The effective information image refers to an image corresponding to the whole page containing high-value effective information in the report; a valid information header, which refers to a header of a valid information image; the effective information keyword refers to a keyword included in the effective information image and/or a keyword around the effective information image. In some embodiments, one valid information group may include at least one valid information corresponding to the search keyword, belonging to the same report file, the at least one valid information may be any one or more of a valid information image, a valid information title, and a valid information keyword corresponding to the search keyword.
Step S202, determining the report file to which each effective information group belongs, and acquiring the report file information of each report file;
specifically, each valid information group may be stored in a valid information database, when a user searches by using a search keyword, a preset search engine may query the preset valid information database to obtain at least one matching valid information group, and each queried valid information group may belong to different report files. For example, three valid information groups are obtained by querying: the report file comprises a group a, a group B and a group c, wherein the group a and the group B belong to a report file A, and the group c belongs to a report file B. One of the report files may be an industry report, for example, the industry report "OTA industry-national securities dealer report" is a report file.
Therefore, after each effective information group is obtained through searching, the report file to which each effective information group belongs can be further determined, and the report file information of each report file can be obtained. Wherein, the report file information includes but is not limited to: ID of report, creator, tag, summary, creation time, industry type.
Step S203, aggregating the effective information groups based on the belonged report files to obtain aggregated effective information groups corresponding to each report file;
after the report file to which each valid information group belongs is determined, each valid information group can be aggregated according to the report file to which each valid information group belongs, so that the aggregated valid information group corresponding to each report file is determined. For example, the group a, the group B, and the effective information groups c of the previous example are aggregated, so that the group a and the effective information groups B corresponding to the report file a are determined, and the effective information groups c corresponding to the report file B are determined.
Step S204, generating a content box aiming at each report to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group;
specifically, content boxes are generated for each report and corresponding respective valid information groups, thereby obtaining the same number of content boxes as the number of report files, each content box including the report file information of the report file and the corresponding valid information group.
And step S205, respectively displaying at least one content box.
After the plurality of content boxes are obtained, the content boxes can be respectively displayed in an interface of the application program. For example, the content box 1 displays the report file information and a group and B valid information groups of the report file a, and the content box 2 displays the report file information and c valid information groups of the report file B.
In the embodiment of the invention, at least one effective information group corresponding to a search keyword is searched for the search keyword, then a report file to which each effective information group belongs is determined, report file information of each report file is obtained, each effective information group is aggregated based on the affiliated report file to obtain an aggregated effective information group corresponding to each report file, and a content box is generated for each report file to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group; respectively displaying at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
In another embodiment, there is provided an information processing method, as shown in fig. 3, the method including:
step S301, aiming at the search keyword, searching to obtain at least one corresponding effective information group;
in the embodiment of the invention, the terminal can be provided with an application program for browsing industry reports, the application program can comprise a search interface, a search bar can be arranged in the search interface, a user can input search keywords in the search bar and search, and the application program obtains corresponding search results through searching and displays the search results to the user. The industry report may be an industry analysis report, an industry research report, an industry data report, and the like, for example, the OTA industry-securities dealer report is an industry report.
Further, the search result may be at least one valid information group corresponding to the search keyword. The effective information refers to report content which is matched with the search intention of the user in the industry report and can directly provide high-value reference information for the research work of the user. According to the report writing standards and habits of various organizations and teams in the current market, high-value information in the report generally appears in the chart content of the report.
A valid information group includes, but is not limited to, a valid information image, a valid information header, and a valid information key in a report. The effective information image refers to an image corresponding to the whole page containing high-value effective information in the report; a valid information header, which refers to a header of a valid information image; the effective information keyword refers to a keyword included in the effective information image and/or a keyword around the effective information image.
When the application program searches according to the search keyword, a preset effective information search interface can be called to search, and the effective information search interface comprises:
the request method comprises the following steps: GET (GET tool)
Request path: /api/search/modules
Request parameters: keyword, search keyword.
Step S302, determining the report file to which each effective information group belongs, and acquiring the report file information of each report file;
specifically, each valid information group may be stored in a valid information database, when a user searches by using a search keyword, a preset search engine may query from the preset valid information database to obtain at least one matching valid information group, and each queried valid information group may belong to different report files; wherein one effective information group includes an effective information image, an effective information header, and an effective information keyword. For example, three valid information groups are obtained by querying: the report file comprises a group a, a group B and a group c, wherein the group a and the group B belong to a report file A, and the group c belongs to a report file B. One of the report files may be an industry report, for example, the industry report "OTA industry-national securities dealer report" is a report file.
Therefore, after each effective information group is obtained through searching, the report file to which each effective information group belongs can be further determined, and the report file information of each report file can be obtained. Wherein, the report file information includes but is not limited to: ID of report, creator, tag, summary, creation time, industry type.
Step S303, aggregating all the effective information groups based on the affiliated report files to obtain aggregated effective information groups corresponding to each report file;
after the report file to which each valid information group belongs is determined, each valid information group can be aggregated according to the report file to which each valid information group belongs, so that the aggregated valid information group corresponding to each report file is determined. For example, the group a, the group B, and the effective information groups c of the previous example are aggregated, so that the group a and the effective information groups B corresponding to the report file a are determined, and the effective information groups c corresponding to the report file B are determined.
Step S304, generating a content box aiming at each report file to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group;
specifically, content boxes are generated for each report and corresponding respective valid information groups, thereby obtaining the same number of content boxes as the number of report files, each content box including the report file information of the report file and the corresponding valid information group.
For example, as shown in FIG. 4, a content box may include two regions: the first area can be report file information of a report file, and the first area can jump to a detail page of the report file after clicking; the second area may be each valid information group matching the search keyword, and clicking on any valid information group may call a report content reader. Therefore, at least one effective information group belonging to the same report file is displayed in a polymerization manner, the relevance of the at least one effective information group belonging to the same report file is improved, a reader can obtain the effective information group corresponding to each report file by screening the report files, and the problem that disordered search results need to be screened in the prior art is solved.
It should be noted that, in addition to the form of the content box shown in fig. 4, other forms of content boxes are also applicable to the present application; moreover, when the number of the valid information groups is large, a certain number of valid information groups may be displayed in the right side of the content box, and all valid information groups may be displayed in the report content reader, or a scroll bar may be provided in the right side of the content box, so that all valid information groups may be displayed. Of course, in practical application, the user may set the form of the content box and the layout of the content box according to actual requirements, which is not limited in the present application.
Step S305, respectively displaying at least one content box;
after the plurality of content boxes are obtained, each content box can be displayed in an interface of the application program. For example, the report file information and validity information groups a and B of the report file a are displayed in the content box 1, and the report file information and validity information group c of the report file B are displayed in the content box 2. As another example, the search keyword is "OTA hotel agent commissions" resulting in the individual content boxes shown in fig. 5.
Step S306, when a display instruction aiming at any one of at least one content box is received, obtaining effective information titles in each effective information group corresponding to the any content box;
specifically, when the user clicks any one of the at least one content box, a display instruction for the content box is initiated, and at this time, the effective information titles in each effective information group in the content box are obtained.
Step S307, displaying each effective information title and an effective information group corresponding to the currently selected effective information title in each effective information title through a preset report content reader;
each effective information title and an effective information group corresponding to the currently selected effective information title in each effective information title are displayed through a preset report content reader, as shown in fig. 6A.
The report content reader can browse and manage all effective information groups in different report files corresponding to the current search keywords. As shown in fig. 6B, the report content box may include four parts:
1) search keyword information area
And displaying the search keywords input by the user under the current search result page.
2) Report and valid information title navigation area
And displaying the report file to which the content currently being browsed belongs and other effective information titles belonging to the same report file. The user can browse continuously by clicking on other effective information titles in the navigation area for switching or scrolling a cursor.
Further, after the last valid information header of a certain report file, the navigation area may automatically load the header of the next report file, as shown in fig. 7.
3) Report content reading area
The effective information image corresponding to the effective information header, that is, the detailed report content, can be enlarged and reduced.
4) Report content operating region
At least one interactive instruction for the currently presented effective information report content is provided, and in the embodiment of the present invention, the interactive instruction includes but is not limited to: extract, download and text.
Wherein, report the content box and can call predetermined effective information acquisition interface and obtain the effective information, and the effective information acquisition interface includes:
the request method comprises the following steps: GET (GET tool)
Request path: /api/report? modules 1
Request parameters: reporting relevant parameters of the file.
Step S308, when any interactive instruction in at least one interactive instruction is triggered, executing an interactive action corresponding to the interactive instruction aiming at the currently displayed effective information group;
wherein, clicking the 'excerpt', the effective information group can be excerpted into the notebook; clicking 'download', the effective information image can be downloaded to the local; clicking the 'original text', opening a new page window, displaying an original report file to which the effective information image belongs in the new page window, and positioning to a page with the same content as the effective information image.
When any interactive instruction in at least one interactive instruction is triggered, executing an interactive action corresponding to the interactive instruction aiming at the currently displayed effective information group, wherein the interactive action comprises the following steps:
when the excerpt instruction is triggered, judging whether a generated notebook exists in a preset favorite;
if yes, displaying a notebook list of the generated notebook, and copying a currently displayed effective information group into the notebook when a confirmation instruction for any notebook in the notebook list is received;
and if not, displaying a preset notebook creating interface, creating a new notebook based on the notebook creating interface, and copying the currently displayed effective information group to the new notebook.
Specifically, when the user clicks the "excerpt", it may be determined whether a generated notebook exists in the preset favorite, that is, a notebook that the user has established, and if so, a notebook list may be displayed through a preset list window, where the notebook list may include all generated notebooks. When the user selects any one of the notebooks and confirms, the currently displayed valid information group is copied to the notebooks confirmed by the user, and then the "excerpt" in the report content operation area can be changed into the "excerpted" as shown in fig. 8A to 8B.
If the favorite has no generated notebook, the newly-built notebook window can be directly displayed, the user can set the name of the notebook in the newly-built notebook window, and after determining that the notebook can be generated, the excerpt in the report content operation area can be changed into the excerpted state, as shown in fig. 9.
Further, a button of "new notebook" may be further set in the list window, and after the user clicks the button, the new notebook window may still be displayed, as shown in fig. 9, the user may set a name of the notebook in the new notebook window, and after the determination, the notebook may be generated, and at this time, the "excerpt" in the report content operation area is changed to the "excerpted", as shown in fig. 8B.
The notebook may be a container for recording the valid information groups, and may be used to manage and browse the extracted valid information groups. The user can newly build, delete and modify the notebook, and can extract the effective information group into the notebook for viewing.
Referring to fig. 10, the detailed steps of the excerpt may be as follows:
1) a user initiates a request for extracting a certain effective information group, and at the moment, a notebook needs to be selected (including selecting one from generated notebooks or building a new notebook);
2) the valid information set will be fully cloned and not correlated. Cloning may prevent snippets from being unable to view when a valid information group is deleted. After cloning, extracting the copy as an effective information group;
3) at the moment, the excerpts are associated to the selected notebook;
4) when the user needs to check the excerpt, a request for checking the content of the notebook computer is initiated.
Further, the interface of the notebook may include:
1) notebook list GET/api/notebooks
2) Notebook details (with extract list) GET/api/notebook/{ $ notebook _ id }
3) New notebook POST/api/notebooks
Parameters are as follows: must fill in title, length 255.
4) Update notebook PUT/PATCH/api/notebook/{ $ notebook _ id }
Parameters are as follows: must fill in title, length 255.
5) DELETE notebook DELETE/api/notebook/{ $ notebook _ id }
Parameters are as follows: selecting filling force, selecting a value of 0 or 1, wherein the meaning of the parameter is whether to forcibly delete, and if the meaning of the parameter is 0, not deleting; if the number is 1, deleting the excerpts together with the excerpts in the notebook (not prompting that the excerpts exist in the notebook);
when the notebook is deleted, if there is an excerpt in the notebook, prompt and confirmation information is generated, such as: the ' note book has excerpts and is deleted ' or not ', the confirmation information comprises ' yes ' and ' no ', and if the user clicks ' yes ', the force value is 1; if the user clicks "No," the force value is 0.
6) Extract content POST/api/notebooks/{ $ notebook _ id }/excerpt
Parameters are as follows: the report _ module _ id, content module id must be filled.
7) The excerpts POST/api/notebooks/{ $ notebook _ id }/unexcept are deleted
Parameters are as follows: the report _ module _ id, content module id, may be filled in, and multiple digests in a notebook may be deleted, with commas in english separated, for example: report _ module _ id is 1,2, 3.
Step S309, when a display instruction for any one of the generated notebooks in the preset favorites is received, displaying the effective information group in the notebooks through the report content reader.
In particular, all of the excerpted content may be uniformly viewed and managed in the favorites, as shown in FIG. 11. In the embodiment of the present invention, the report content reader is used as a control with strong universality, and can display an effective information group, and can also be multiplexed in more similar scenes, for example, extracted content in a notebook can still be displayed by using the control of the report content reader, as shown in fig. 12, when a user clicks any one of the generated notebooks, the user can call the report content reader to browse the extracts.
In a preferred embodiment of the present invention, the searching for at least one valid information group corresponding to the search keyword includes:
performing Query analysis on the search keywords to obtain analyzed keywords;
assembling the analyzed keywords based on an elastic search DSL grammar to obtain Query sentences of effective information groups; the query statement comprises a keyword field and a title field;
and querying by adopting the query statement and an index in a preset search engine to obtain at least one effective information group matched with the search keyword.
Specifically, in a search interface of an application, two search modes may be set: the method comprises the following steps of 'searching content' and 'searching report', wherein the searching report can be a search based on the name of a report file, namely a common search; searching the content is based on the content of the report.
Fig. 13 is a schematic diagram of a search process based on search keywords in the embodiment of the present invention. When searching is carried out aiming at a search keyword input by a user, firstly judging whether a search mode is 'search content', if not, carrying out ordinary search (namely 'search report'), and obtaining a result page of the ordinary search; if so, performing Query analysis on the search keywords, including word segmentation and near word expansion on the search keywords to obtain analyzed search keywords, and then performing statement assembly on the analyzed keywords by using an elastic search Query DSL syntax to obtain Query statements for querying an effective information group, wherein the Query statements comprise keyword fields and title fields; and querying the query statement by a preset search engine, including querying by using the query statement and an index in the search engine, to obtain at least one effective information group matched with the search keyword, and then executing step S201 to step S205, or step S302 to step S305.
Wherein, Chinese word segmentation and near word expansion are carried out on the search keywords, and the analyzed search keywords can be obtained. Chinese segmentation can use the elastic search open source plug-in IK Analysis for elastic search, and hypernym expansion can use an empirically summarized hypernym lexicon.
The analyzed search keywords are assembled based on the Elasticsearch DSL syntax to obtain Query statements of an effective information group, the Query statements search for the titles and keywords of the content modules, wherein the weights of the titles and the keywords are also set, for example, "market research" is a Query statement, and "research" is a synonym.
It should be noted that the word segmentation plug-in may be other word segmentation plug-ins besides the above plug-in, and may be set according to actual requirements in actual applications, which is not limited in this embodiment of the present invention; the word stock of the near-sense word can be obtained in other ways besides the word stock obtained in the above way, and can be set according to actual requirements in practical application, which is not limited in the embodiment of the present invention.
Further, the index in the search engine may also include the ID of the report, that is, the ID of the report file to which each valid information group belongs, so that when each valid information group is obtained by searching, the report file to which each valid information group belongs may also be determined.
In a preferred embodiment of the present invention, the index in the preset search engine is generated as follows:
when detecting that any one of at least one valid information group stored in a preset valid information database is updated, acquiring a valid information title and a valid information keyword of the valid information group with the updated data; the data updating comprises at least one of addition, deletion and modification of the effective information group;
generating an index based on the effective information title and the effective information keyword, and establishing a mapping relation between the effective information title, the effective information keyword and the index; wherein the index includes a title field and a key field.
The search engine may be ES (elastic search), which is a distributed full-text search engine. ES is document-oriented, meaning that it can store an entire object or document. However, it not only stores, but indexes (indexes) the content of each document so that it can be searched. In the ES, a user may perform operations such as indexing, searching, etc., on a document or object (rather than data in rows and columns).
Specifically, the ES may retrieve data from the effective information database based on the asynchronous script. As shown in fig. 14, when a valid information group in the valid information database MYSQL is updated, a data modification event is triggered, and the data modification event enters an event processing queue to wait for the ES to perform corresponding data processing on the valid information group; wherein the data update comprises at least one of addition, deletion and modification of the effective information group. Thus, the ES can update the effective information group from the effective information database in real time.
Further, when the ES updates the valid information group, it needs to update an index (index) including new, modified, and deleted indexes based on the updated valid information group, where the index includes a header field and a key field, and determine a mapping (mappings) of the index and the valid information group, and the mapping can tell the ES how to handle the newly added various fields. Fields that the valid information group needs to be processed are title (valid information title) and keyword (valid information keyword). the title is mapped into a text type field, and is subjected to word segmentation and inverted indexing during processing, and an ik plug-in can be used during word segmentation; a keyword is mapped to a keyword type and will only be exactly matched.
Furthermore, when the ES performs corresponding data processing on the valid information group, the ES may further acquire an ID of a report file to which the valid information group belongs (excluding the deletion of the valid information).
In a preferred embodiment of the present invention, the preset valid information database is generated as follows:
acquiring a report file;
performing document map cutting processing on the report file according to the page number to obtain at least one report file image;
carrying out block recognition on each report file image to obtain at least one block corresponding to each report file image;
taking a report file image of which at least one block meets a preset requirement in each report file image as an effective information image to obtain at least one effective information image;
extracting an effective information title and an effective information keyword of each effective information image, and establishing an association relation between each effective information image and the effective information title and the effective information keyword which are respectively corresponding to each effective information image;
and storing each effective information image, an effective information title and an effective information keyword which are respectively corresponding to each effective information image and an association relation into an effective information database.
Specifically, any complete report file is obtained, and then each page of the report file is subjected to document cropping processing to obtain at least one report file image. The document map cutting is to convert the obtained document into an image by pages, such as an image in png format, and specifically, pdftopng in a software xpdf toolkit can be used, which can convert pdf pages into images in png format.
And then carrying out block recognition on each report file image to obtain at least one block in each report file image. The block Recognition may adopt OCR (Optical Character Recognition), each report file image is subjected to OCR, characters in different regions in the report file image may be referred to as blocks, the blocks are subjected to OCR to obtain information such as content, position, confidence, paragraph, and the like of the blocks, the blocks obtained by the OCR need to be filtered to delete non-characters and digital blocks, and the OCR effect is shown in fig. 15.
In a preferred embodiment of the present invention, it is detected whether the number of digital blocks in each report document image exceeds a first number threshold;
and if so, taking the report file images exceeding the first quantity threshold value in each report file image as effective information images to obtain at least one effective information image.
In a preferred embodiment of the present invention, taking a report document image in which at least one block of each report document image meets a preset requirement as an effective information image, obtaining at least one effective information image, includes:
detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds a ratio threshold;
and if so, taking the report file image exceeding the proportion threshold value in each report file image as an effective information image to obtain at least one effective information image.
In a preferred embodiment of the present invention, taking a report document image in which at least one block of each report document image meets a preset requirement as an effective information image, obtaining at least one effective information image, includes:
acquiring the height of at least one character block in each report file image, and determining a preset number of target character blocks with the maximum height;
detecting whether a target block in each report file image contains a Chinese block or not;
if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold;
if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image to obtain at least one effective information image.
Specifically, the high-value effective information is generally in the form of a graph and a table, has a general conclusion, has many digital information, and can be judged for any report file image by using the following rules:
1) whether the number of pure digital blocks (including those containing "%", "-", "+") exceeds a first number threshold, such as 30;
2) whether the ratio of the number of pure digital blocks to the number of total blocks in the report file image exceeds a ratio threshold, such as 0.2;
3) acquiring the height of at least one block in each report file image, and determining a preset number of target blocks with the maximum height; detecting whether a target block in each report file image contains a Chinese block or not; if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold; if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image; for example, all the blocks in the report file image are sorted in a descending order according to block heights, then target blocks of the first three in the sorting are obtained, whether the three target blocks contain Chinese blocks is detected, and if yes, whether the number of Chinese characters in the target blocks containing the Chinese blocks exceeds 8 is detected.
Of course, the above rules and values are summarized according to practical experiments, and in practical applications, the rules and values may be adjusted according to practical requirements, which is not limited in the embodiment of the present invention. Moreover, the at least one rule may be adopted in the detection, or other rules may be adopted in addition to the rule, and may be set according to actual requirements in practical application, which is not limited in this embodiment of the present invention.
And judging all the character blocks in each page of report file image based on the rule, and taking the report file image meeting the rule as an effective information image to obtain at least one effective information image.
The extraction process of an effective information image can be abstracted to a Job class, the processing of the Job class is performed in a queue, and the execution flow is shown in fig. 16.
And extracting an effective information title and an effective information keyword aiming at each effective information image. The extraction of effective information titles and effective information keywords can adopt Natural Language Processing (NLP) service of Tencent cloud, NLP service deeply integrates NLP technology inside Tencent, 18 intelligent text processing capabilities are provided by means of billion-level Chinese corpus accumulation, and the intelligent text processing capabilities comprise intelligent word segmentation, entity recognition, text error correction, emotion analysis, text classification, sensitive audit, word vector, keyword extraction, automatic summarization, intelligent chatting, encyclopedic knowledge map query and the like.
Then, establishing each effective information image and the association relationship between the effective information header and the effective information keyword corresponding to each effective information image, and then storing the effective information image, the effective information header, the effective information keyword and the association relationship into a preset effective information database, besides the effective information, other data can be stored, and a data table is generated to establish the association relationship between each effective information group and other data, and the generated data table is shown as table 1:
Figure BDA0002563416120000261
Figure BDA0002563416120000271
TABLE 1
In the embodiment of the invention, any database can adopt an Object Storage (COS), and the COS is a distributed Storage service which is provided by Tencent Cloud, has no directory hierarchy and no data format limitation, can contain mass data and supports HTTP/HTTPS protocol access.
It should be noted that the complete report file may be stored in a preset complete report database, and the complete report database and the valid information database may be two independent databases, or may be two independent parts in one database, and may be set according to actual requirements in actual applications, which is not limited in this embodiment of the present invention.
In the embodiment of the invention, aiming at a search keyword, at least one effective information group corresponding to the search keyword is obtained through searching, then a report file to which each effective information group belongs is determined, report file information of each report file is obtained, each effective information group is aggregated based on the affiliated report file, an aggregated effective information group corresponding to each report file is obtained, a content box is generated aiming at each report file, and at least one content box is obtained; the content box comprises report file information of the report file and a corresponding effective information group; respectively displaying at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
Fig. 17 is a schematic structural diagram of an information processing apparatus according to yet another embodiment of the present application, and as shown in fig. 17, the apparatus according to this embodiment may include:
a search module 1701 for searching for at least one valid information group corresponding to the search keyword;
a processing module 1702, configured to determine report files to which each valid information group belongs, and obtain report file information of each report file;
an aggregation module 1703, configured to aggregate the effective information groups based on the report files to which the effective information groups belong, to obtain aggregated effective information groups corresponding to each report file;
a generating module 1704, configured to generate a content box for each report file, so as to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group;
and a display module 1705 for displaying at least one content box respectively.
In a preferred embodiment of the present invention, any one of the at least one valid information group includes a valid information image, a valid information header, and a valid information keyword;
the device also includes:
the receiving module is used for receiving a display instruction aiming at any one content box in at least one content box;
the acquisition module is used for acquiring effective information titles in each effective information corresponding to any content box;
and the display module is also used for displaying each effective information title and the effective information corresponding to the currently selected effective information title in each effective information title through a preset report content reader.
In a preferred embodiment of the invention, the report content reader is further provided with at least one interactive instruction aiming at the currently presented effective information group;
the device also includes:
and the execution module is used for executing the interactive action corresponding to the interactive instruction aiming at the currently displayed effective information group when any interactive instruction in the at least one interactive instruction is triggered.
In a preferred embodiment of the present invention, the interactive instructions include excerpt instructions;
the execution module is specifically configured to:
when the excerpt instruction is triggered, judging whether a generated notebook exists in a preset favorite;
if yes, displaying a notebook list of the generated notebook, and copying a currently displayed effective information group into the notebook when a confirmation instruction for any notebook in the notebook list is received;
and if not, displaying a preset notebook creating interface, creating a new notebook based on the notebook creating interface, and copying the currently displayed effective information group to the new notebook.
In a preferred embodiment of the present invention, the receiving module is further configured to receive a display instruction for any one of the generated notebooks in the preset favorites;
and the display module is also used for displaying the effective information group in the notebook through the report content reader.
In a preferred embodiment of the present invention, the search module includes:
the analysis submodule is used for carrying out Query analysis on the search keywords to obtain analyzed keywords;
the sentence splicing submodule is used for splicing the analyzed keywords based on the Elasticissearch Query DSL grammar to obtain the Query sentences of the effective information groups; the query statement comprises a keyword field and a title field;
and the query submodule is used for querying by adopting a query statement and an index in a preset search engine to obtain at least one effective information group matched with the search keyword.
In a preferred embodiment of the present invention, the preset search engine is generated as follows:
when detecting that any one of the at least one valid information group stored in the preset valid information database is updated, acquiring a valid information header and a valid information keyword of the valid information group with the updated data; the data updating comprises at least one of addition, deletion and modification of the effective information group;
generating an index based on the effective information title and the effective information keyword, and establishing a mapping relation between the effective information title, the effective information keyword and the index; wherein the index includes a title field and a key field.
In a preferred embodiment of the present invention, the preset valid information database is generated as follows:
acquiring a report file;
performing document map cutting processing on the report file according to the page number to obtain at least one report file image;
carrying out block recognition on each report file image to obtain at least one block corresponding to each report file image;
taking a report file image of which at least one block meets a preset requirement in each report file image as an effective information image to obtain at least one effective information image;
extracting an effective information title and an effective information keyword of each effective information image, and establishing an association relation between each effective information image and the effective information title and the effective information keyword which are respectively corresponding to each effective information image;
and storing each effective information image, an effective information title and an effective information keyword which are respectively corresponding to each effective information image and an association relation into an effective information database.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the number of digital blocks in each report file image exceeds a first number threshold;
and if so, taking the report file images exceeding the first quantity threshold value in each report file image as effective information images to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds a ratio threshold;
and if so, taking the report file image exceeding the proportion threshold value in each report file image as an effective information image to obtain at least one effective information image.
Preferably, taking the report document image in which the at least one block meets the preset requirement in each report document image as an effective information image, to obtain at least one effective information image, includes:
acquiring the height of at least one character block in each report file image, and determining a preset number of target character blocks with the maximum height;
detecting whether a target block in each report file image contains a Chinese block or not;
if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold;
if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image to obtain at least one effective information image.
The information processing apparatus of this embodiment can execute the information processing methods shown in the first embodiment and the second embodiment of this application, and the implementation principles thereof are similar and will not be described herein again.
In the embodiment of the invention, at least one effective information group corresponding to a search keyword is searched for the search keyword, then a report file to which each effective information group belongs is determined, report file information of each report file is obtained, each effective information group is aggregated based on the affiliated report file to obtain an aggregated effective information group corresponding to each report file, and a content box is generated for each report file to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group; respectively displaying at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
In another embodiment of the present application, there is provided an electronic device including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: aiming at a search keyword, searching to obtain at least one effective information group corresponding to the search keyword, then determining a report file to which each effective information group belongs, obtaining report file information of each report file, aggregating each effective information group based on the affiliated report file to obtain an aggregated effective information group corresponding to each report file, and generating a content box aiming at each report file to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group; respectively displaying at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
In an alternative embodiment, there is provided an electronic device, as shown in fig. 18, an electronic device 18000 shown in fig. 18 comprising: a processor 18001, and a memory 18003. The processor 18001 is coupled to the memory 18003, such as via a bus 18002. Optionally, the electronic device 18000 can also include a transceiver 18004. It should be noted that, in practical applications, the transceiver 18004 is not limited to one, and the structure of the electronic device 18000 is not limited to the embodiment of the present application.
The processor 18001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 18001 may also be a combination of computing devices, e.g., a combination of one or more microprocessors, a DSP and a microprocessor, or the like.
Bus 18002 may include a path to transfer information between the above components. The bus 18002 may be a PCI bus or an EISA bus, etc. The bus 18002 may be divided into an address bus, a data bus, a control bus, and so on. For ease of illustration, only one thick line is shown in FIG. 18, but this does not mean only one bus or one type of bus.
The memory 18003 may be, but is not limited to, a ROM or other type of static storage device that may store static information and instructions, a RAM or other type of dynamic storage device that may store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 18003 is used for storing application program code for implementing the aspects of the present application, and is controlled by the processor 18001 for execution. The processor 18001 is configured to execute application program code stored in the memory 18003 to implement any of the method embodiments described above.
Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like.
Yet another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when run on a computer, enables the computer to perform the corresponding content in the aforementioned method embodiments. Compared with the prior art, at least one effective information group corresponding to the search keyword is obtained through searching aiming at the search keyword, then a report file to which each effective information group belongs is determined, report file information of each report file is obtained, each effective information group is aggregated based on the corresponding report file to obtain an aggregated effective information group corresponding to each report file, and a content box is generated aiming at each report file to obtain at least one content box; the content box comprises report file information of the report file and a corresponding effective information group; respectively displaying at least one content box. Through the mode, the content of all reports can be comprehensively identified according to the search keywords, including but not limited to the visual chart type title, compared with the prior art in which only the visual chart type title is identified, the problem that the hit rate of the search keywords is low for organization type reports and other types of reports with diversified content formats and high complexity is caused. Meanwhile, each effective information group which is matched with the search keyword and belongs to different report files is obtained through comprehensive identification, and then aggregation display is carried out on each effective information group based on the report files, so that a plurality of effective information groups which are matched with the search keyword in the same report file have relevance, the discrimination behavior of a user during browsing is reduced, and the user experience is improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. An information processing method characterized by comprising:
aiming at a search keyword, searching to obtain at least one effective information group corresponding to the search keyword;
determining the report file to which each effective information group belongs, and acquiring the report file information of each report file;
aggregating all the effective information groups based on the affiliated report files to obtain aggregated effective information groups corresponding to each report file;
generating a content box aiming at each report file to obtain at least one content box; the content box comprises the report file information of a report file and a corresponding valid information group;
respectively displaying the at least one content box.
2. The information processing method according to claim 1, wherein any one of the at least one valid information group includes a valid information image, a valid information header, and a valid information keyword;
the method further comprises the following steps:
when a display instruction for any one of the at least one content box is received, obtaining effective information titles in each effective information group corresponding to the any content box;
and displaying each effective information title and an effective information group corresponding to the currently selected effective information title in each effective information title through a preset report content reader.
3. The information processing method according to claim 2, wherein the report content reader is further provided with at least one interactive instruction for currently presented valid information;
the method further comprises the following steps:
and when any interactive instruction in at least one interactive instruction is triggered, executing the interactive action corresponding to the interactive instruction according to the currently displayed effective information.
4. The information processing method according to claim 3, wherein the interactive instruction includes a snippet instruction;
when any interactive instruction in at least one interactive instruction is triggered, the interactive action corresponding to the interactive instruction is executed aiming at the currently displayed effective information group, and the method comprises the following steps:
when the excerpt instruction is triggered, judging whether a generated notebook exists in a preset favorite;
if yes, displaying a notebook list of the generated notebook, and copying the currently displayed effective information group to the notebook when a confirmation instruction for any notebook in the notebook list is received;
and if not, displaying a preset notebook creating interface, creating a new notebook based on the notebook creating interface, and copying the currently displayed effective information group to the new notebook.
5. The information processing method according to any one of claims 1 to 4, further comprising:
and when a display instruction for any one of the generated notebooks in the preset favorites is received, displaying the effective information in the notebooks through a report content reader.
6. The information processing method according to claim 1, wherein the searching for at least one valid information group corresponding to the search keyword includes:
performing Query analysis on the search keywords to obtain analyzed keywords;
assembling the analyzed keywords based on an Elasticissearch DSL grammar to obtain Query sentences of effective information groups; the query statement comprises a keyword field and a title field;
and querying by adopting the query statement and an index in a preset search engine to obtain at least one effective information group matched with the search keyword.
7. The information processing method according to claim 1 or 6, wherein the index in the preset search engine is generated by:
when detecting that any one of the at least one valid information group stored in the preset valid information database is updated, acquiring a valid information header and a valid information keyword of the valid information group with the updated data; the data update comprises at least one of addition, deletion and modification of the effective information group;
generating an index based on the effective information title and the effective information keyword, and establishing a mapping relation between the effective information title, the effective information keyword and the index; wherein the index includes a title field and a key field.
8. The information processing method according to claim 7,
the preset effective information database is generated in the following way:
acquiring a report file;
performing document image cutting processing on the report file according to the page number to obtain at least one report file image;
carrying out block recognition on each report file image to obtain at least one block corresponding to each report file image;
taking the report file image of which at least one block meets the preset requirement in each report file image as an effective information image to obtain at least one effective information image;
extracting an effective information title and an effective information keyword of each effective information image, and establishing an association relation between each effective information image and the effective information title and the effective information keyword which are respectively corresponding to each effective information image;
and storing each effective information image, an effective information title and an effective information keyword which are respectively corresponding to each effective information image and the association relation to the effective information database.
9. The information processing method according to claim 8, wherein taking a report document image in which the at least one block satisfies a preset requirement in each report document image as an effective information image, to obtain at least one effective information image, comprises:
detecting whether the number of digital blocks in each report file image exceeds a first number threshold;
and if so, taking the report file images exceeding the first quantity threshold value in each report file image as effective information images to obtain at least one effective information image.
10. The information processing method according to claim 8, wherein taking a report document image in which the at least one block satisfies a preset requirement in each report document image as an effective information image, to obtain at least one effective information image, comprises:
detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds a ratio threshold;
and if so, taking the report file image exceeding the proportion threshold value in each report file image as an effective information image to obtain at least one effective information image.
11. The information processing method according to claim 8, wherein taking a report document image in which the at least one block satisfies a preset requirement in each report document image as an effective information image, to obtain at least one effective information image, comprises:
acquiring the height of at least one character block in each report file image, and determining a preset number of target character blocks with the maximum height;
detecting whether a target block in each report file image contains a Chinese block or not;
if yes, detecting whether the number of Chinese characters in a target block containing the Chinese block exceeds a third number threshold;
if so, taking the report file image of which the target character block in each report file image comprises the Chinese character block as an effective information image to obtain at least one effective information image.
12. An information processing apparatus characterized by comprising:
the search module is used for searching for at least one effective information group corresponding to the search keyword according to the search keyword;
the processing module is used for determining the report file to which each effective information group belongs and acquiring the report file information of each report file;
the aggregation module is used for aggregating all the effective information groups based on the belonged report files to obtain the aggregated effective information groups corresponding to each report file;
the generating module is used for generating a content box aiming at each report file to obtain at least one content box; the content box comprises the report file information of a report file and a corresponding valid information group;
and the display module is used for respectively displaying the at least one content box.
13. An electronic device, comprising:
a processor, a memory, and a bus;
the bus is used for connecting the processor and the memory;
the memory is used for storing operation instructions;
the processor is configured to execute the information processing method according to any one of claims 1 to 11 by calling the operation instruction.
14. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the information processing method of any one of claims 1 to 11.
CN202010622216.7A 2020-06-30 2020-06-30 Information processing method, information processing device, electronic equipment and computer readable storage medium Pending CN111666383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010622216.7A CN111666383A (en) 2020-06-30 2020-06-30 Information processing method, information processing device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010622216.7A CN111666383A (en) 2020-06-30 2020-06-30 Information processing method, information processing device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111666383A true CN111666383A (en) 2020-09-15

Family

ID=72391184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010622216.7A Pending CN111666383A (en) 2020-06-30 2020-06-30 Information processing method, information processing device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111666383A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612944A (en) * 2020-12-07 2021-04-06 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN113239650A (en) * 2021-07-09 2021-08-10 成都爱旗科技有限公司 Report generation method and device and electronic equipment
CN113297345A (en) * 2021-05-21 2021-08-24 深圳市智尊宝数据开发有限公司 Analysis report generation method, electronic equipment and related product
CN113535892A (en) * 2021-06-08 2021-10-22 北京易创新科信息技术有限公司 Industry research report searching method and device and electronic equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612944A (en) * 2020-12-07 2021-04-06 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN112612944B (en) * 2020-12-07 2024-05-31 深圳价值在线信息科技股份有限公司 Case information management method, terminal equipment and system
CN113297345A (en) * 2021-05-21 2021-08-24 深圳市智尊宝数据开发有限公司 Analysis report generation method, electronic equipment and related product
CN113297345B (en) * 2021-05-21 2021-12-03 深圳市智尊宝数据开发有限公司 Analysis report generation method, electronic equipment and related product
CN113535892A (en) * 2021-06-08 2021-10-22 北京易创新科信息技术有限公司 Industry research report searching method and device and electronic equipment
CN113535892B (en) * 2021-06-08 2023-12-01 北京易创新科信息技术有限公司 Search method and device for industry research report and electronic equipment
CN113239650A (en) * 2021-07-09 2021-08-10 成都爱旗科技有限公司 Report generation method and device and electronic equipment
CN113239650B (en) * 2021-07-09 2021-10-15 成都爱旗科技有限公司 Report generation method and device and electronic equipment

Similar Documents

Publication Publication Date Title
KR100462292B1 (en) A method for providing search results list based on importance information and a system thereof
US10878044B2 (en) System and method for providing content recommendation service
CN111666383A (en) Information processing method, information processing device, electronic equipment and computer readable storage medium
CN110489558B (en) Article aggregation method and device, medium and computing equipment
CN107085583B (en) Electronic document management method and device based on content
US20170212899A1 (en) Method for searching related entities through entity co-occurrence
US9129009B2 (en) Related links
US8631097B1 (en) Methods and systems for finding a mobile and non-mobile page pair
CN103136228A (en) Image search method and image search device
US20130339840A1 (en) System and method for logical chunking and restructuring websites
CN105493075A (en) Retrieval of attribute values based upon identified entities
CN102214208A (en) Method and equipment for generating structured information entity based on non-structured text
WO2023241332A1 (en) Snippet information generation method and apparatus, search result display method and apparatus, device, and medium
US11745093B2 (en) Developing implicit metadata for data stores
RU2693193C1 (en) Automated extraction of information
CN110674087A (en) File query method and device and computer readable storage medium
US11250084B2 (en) Method and system for generating content from search results rendered by a search engine
CN116186198A (en) Information retrieval method, information retrieval device, computer equipment and storage medium
KR101662215B1 (en) Search system and method for providing expansion search information
US8892596B1 (en) Identifying related documents based on links in documents
CN112989011B (en) Data query method, data query device and electronic equipment
US8195458B2 (en) Open class noun classification
KR101757755B1 (en) Method for distributed processing research of prior art and server and system implementing the same
KR101647596B1 (en) Method and server for providing contents service
US20160150038A1 (en) Efficiently Discovering and Surfacing Content Attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination