CN111782916B - Method and device for generating business information report - Google Patents

Method and device for generating business information report Download PDF

Info

Publication number
CN111782916B
CN111782916B CN202010842237.XA CN202010842237A CN111782916B CN 111782916 B CN111782916 B CN 111782916B CN 202010842237 A CN202010842237 A CN 202010842237A CN 111782916 B CN111782916 B CN 111782916B
Authority
CN
China
Prior art keywords
information
business
report
service
business information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010842237.XA
Other languages
Chinese (zh)
Other versions
CN111782916A (en
Inventor
苏豫陇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010842237.XA priority Critical patent/CN111782916B/en
Publication of CN111782916A publication Critical patent/CN111782916A/en
Application granted granted Critical
Publication of CN111782916B publication Critical patent/CN111782916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification provides a method and a device for generating a business information report. In the method, a generation request of the business information report is received, the generation request comprises a report subject of the business information report, and report configuration information of the business information report is determined according to the generation request of the business information report, wherein the report configuration information comprises a report template and an information source address list; the web crawler is utilized to crawl the business information according to the information source address list, determine target business information from the crawled business information, and generate a business information report according to the target business information and the report template.

Description

Method and device for generating business information report
Technical Field
The embodiment of the specification relates to the technical field of computer networks, in particular to a method and a device for generating a business information report.
Background
The internet is a source of various information and messages, and practitioners in various industries can search the internet for information of interest. For some practitioners, it is daily necessary to pay attention to information within the industry to understand industry dynamics. To facilitate the acquisition of business information by the relevant personnel, business information reports are generated therefrom. The service information report is a report which is formed by disclosing acquired related information through an internet channel, screening and summarizing according to service types. The information presented by the business information report is the focused, hot and latest business information which is arranged, so that related personnel can acquire concerned information from the business information report more directly and conveniently without searching from massive internet information.
Disclosure of Invention
In view of the foregoing, embodiments of the present disclosure provide a method and apparatus for generating a service information report. In the method, in response to a request for generating a service information report, report configuration information of the service information report is determined according to a report subject of the service information report, then service information is crawled according to an information source address list by using a web crawler, target service information is determined from the crawled service information, and the service information report is generated according to the target service information and a report template. By the method, the corresponding business information report can be directly generated, and the report generation efficiency is improved. And, the association degree between the target business information acquired according to the corresponding report configuration information and the business information report is higher, so that the generated business information report presents higher information content quality.
According to one aspect of embodiments of the present specification, there is provided a method for generating a traffic information report, comprising: receiving a generation request of the business information report, wherein the generation request comprises a report subject of the business information report; determining report configuration information of the business information report according to the generation request of the business information report, wherein the report configuration information comprises a report template and an information source address list; utilizing a web crawler to crawl business information according to the information source address list; determining target business information from the crawled business information; and generating the business information report according to the target business information and the report template.
Optionally, in one example of the above aspect, determining the target business information from the crawled business information includes: sorting the crawled business information; and determining the target business information according to the sorting result of the business information.
Optionally, in one example of the above aspect, before ordering the crawled business information, the method further comprises: the information screening process or the information de-duplication process is performed on the crawled business information.
Optionally, in an example of the above aspect, the report configuration information further includes keywords and/or a logical combination between the keywords, and performing the information filtering process on the crawled service information includes: and utilizing the keywords and/or the logic combination among the keywords to carry out information screening processing on the crawled business information.
Optionally, in one example of the above aspect, determining the target business information from the crawled business information includes: the target business information is determined from the crawled business information based at least in part on the degree of association between the business information.
Optionally, in one example of the above aspect, the report template includes at least two business tiles, each of the at least two business tiles being for a different business topic of the report topic, and determining the target business information from the crawled business information based at least in part on a degree of association between the business information includes: the target business information for each business section is determined from the crawled business information based at least in part on the degree of association between the business information and the business information for other business sections.
Optionally, in an example of the above aspect, the information source address list of each service section is determined according to a service topic of each service section.
Optionally, in one example of the above aspect, further includes: determining the presentation sequence of each business block in the report template according to the target business information of each business block; and generating the business information report according to the target business information and the report template comprises: and generating the service information report according to the target service information of each service block, the report template and the presentation sequence of each service block in the report template.
Optionally, in one example of the above aspect, determining the presentation order of each business section in the report template according to the target business information of each business section includes: according to the target business information of each business block, determining a first association degree between each business block and the report subject and a second association degree between each business block and other business blocks; and determining the presentation sequence of each business section in the report template according to the first association degree and the second association degree of each business section.
According to another aspect of the embodiments of the present specification, there is also provided an apparatus for generating a service information report, including: a request receiving unit that receives a generation request of the service information report, the generation request including a report subject of the service information report; a configuration information determining unit for determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list; an information crawling unit for crawling business information according to the information source address list by utilizing a web crawler; a target service information determining unit that determines target service information from the crawled service information; and a report generation unit that generates the service information report based on the target service information and the report template.
Optionally, in one example of the above aspect, the target service information determining unit: sorting the crawled business information; and determining the target business information according to the sorting result of the business information.
Optionally, in one example of the above aspect, the apparatus further includes: an information processing unit for performing information screening processing or information de-duplication processing on the crawled business information.
Optionally, in one example of the above aspect, the target service information determining unit: the target business information is determined from the crawled business information based at least in part on the degree of association between the business information.
Optionally, in one example of the above aspect, the report template includes at least two business sections, each of the at least two business sections being for a different business topic of the report topic, and the target business information determining unit: the target business information for each business section is determined from the crawled business information based at least in part on the degree of association between the business information and the business information for other business sections.
Optionally, in one example of the above aspect, further includes: a layout order determining unit for determining the presentation order of each business layout in the report template according to the target business information of each business layout; the report generation unit: and generating the service information report according to the target service information of each service block, the report template and the presentation sequence of each service block in the report template.
Optionally, in one example of the above aspect, the layout order determining unit: according to the target business information of each business block, determining a first association degree between each business block and the report subject and a second association degree between each business block and other business blocks; and determining the presentation sequence of each business section in the report template according to the first association degree and the second association degree of each business section.
According to another aspect of the embodiments of the present specification, there is also provided an electronic device including: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method for generating a business information report as described above.
According to another aspect of embodiments of the present specification, there is also provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform a method for generating a business information report as described above.
Drawings
A further understanding of the nature and advantages of the present description may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
Fig. 1 shows a flowchart of one example of a method for generating a traffic information report according to an embodiment of the present specification.
Fig. 2 is a flowchart showing an example of a processing procedure for business information according to the embodiment of the present specification.
FIG. 3 illustrates a schematic diagram of one example of generating a business information report based on a report template in accordance with an embodiment of the present disclosure.
Fig. 4 is a block diagram showing an example of the business information report generating apparatus of the embodiment of the present specification.
Fig. 5 shows a block diagram of an electronic device implementing a method for generating a business information report according to an embodiment of the present description.
Detailed Description
The subject matter described herein will be discussed below with reference to example embodiments. It should be appreciated that these embodiments are discussed only to enable a person skilled in the art to better understand and thereby practice the subject matter described herein, and are not limiting of the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure as set forth in the specification. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.
As used herein, the term "comprising" and variations thereof mean open-ended terms, meaning "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment. The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout this specification.
Fig. 1 illustrates a flow chart of one example of a method 100 for generating a business information report in accordance with an embodiment of the present disclosure.
As shown in FIG. 1, at 110, a request to generate a traffic information report may be received.
The business information report may be a report for a reporting subject, the business information report aggregating various information related to the reporting subject, the information being arranged and presented in the business information report according to rules of business type, popularity, authority, etc. The individual information in the service information report may be derived from information disclosed on the internet.
The request for generating the service information report may include a reporting topic of the service information report, the reporting topic being associated with the service information report, the reporting topic determining a content direction of the service information report and a service scope involved, the service information report being for presenting relevant information for the reporting topic. For example, if the business information report to be generated is about a stall economy, it may be determined that the report subject of the business information report may be a stall economy, and further, it may be determined that the business scope to which the business information report relates is a stall economy.
Reporting topics may be topics for the entire industry, e.g., reporting topics may be topics for the financial industry; but may also be a topic for a professional area or branch direction in the industry, e.g., reporting topic may be a topic for a stall economy in the financial industry.
In one example, the request for generation of the business information report may be automatically generated at a specified time by an apparatus implementing the method 100 described above. The specified time may be an interval specified duration or a specified point in time. The device does not need user intervention in the process of generating the service information report by the method 100 provided by the embodiment of the specification, so that the convenience of obtaining the service information report by the user is improved.
In another example, a request to generate a report of business information may be sent by a user, and then an apparatus implementing the method 100 described above generates a report of business information based on the request to generate. In this example, the user may request to generate a business information report according to his own needs (e.g., any point in time), and the user experience may be better for the user.
Then, at 120, report configuration information for the business information report may be determined based on the request for generation of the business information report.
The report configuration information corresponding to each service information report may be different or the same for different service information reports. The report configuration information is used for guiding the generation of corresponding business information reports, and different report configuration information can correspondingly generate different business information reports.
The report configuration information corresponding to each business information report may include a report template and a list of information source addresses for the business information report. The reporting templates are used to determine a reporting overall framework for the corresponding business information report, and the reporting templates for different business information reports may be different.
For example, a business information report with reporting subject matter of a stall economy may include business information of both legal and marketing subjects, and the reporting template of the business information report includes two sections for both legal and marketing subjects, respectively. And the report topic is that the report of business information of the stored business violation is only aimed at one business topic, the report template of the business information report only comprises one edition.
The information source address list may include a plurality of information source addresses, which may include web addresses, database addresses, and the like. The information source address list of the service information report is the source of the service information report, and the information source address list can be different or the same for different service information reports.
The list of information source addresses for the service information report may be determined based on a reporting topic for the service information report, the service information originating from each information source address in the list of information source addresses being associated with the reporting topic. For example, if the reporting subject of the business information report is securities finance, the list of information source addresses of the business information report may include a plurality of securities finance portals.
In addition to the information source addresses in the information source address list being determined based on the report subject, the information source addresses may be determined based on the media authority level, the user access amount, etc. of the distribution service information. For example, websites with higher relevance to the reporting topic, more authoritative media on which the information is published, and greater user access are more readily identified as information source addresses in the information source address list.
In one example, report configuration information corresponding to each business information report may be pre-stored in a report configuration information base. When receiving the request for generating the business information report, the report configuration information corresponding to the business information report can be obtained from the report configuration information base. The report configuration information base may store the corresponding relation between the report theme and the report configuration information, or the corresponding relation between the report identifier of the service information report and the report configuration information. Therefore, the corresponding report configuration information can be obtained only by including the report theme or the report identifier in the generation request, complex configuration is not required for the generation request, and the device can automatically generate the generation request of the business information report.
Next, at 130, the web crawler is utilized to crawl business information according to the list of information source addresses.
The web crawlers in embodiments of the present description may include a general-purpose web crawler, a focused web crawler, and the like. The web crawler crawls business information from each of the information source addresses in the list of information source addresses, in one example, the web crawler may crawl all of the information in each of the information source addresses, including business information and other information.
In another example, a web crawler may only crawl traffic information in individual information source addresses, e.g., a web crawler may selectively crawl traffic information only in information source addresses via regularized expressions. Thus, the crawling amount of the web crawlers from each information source address is reduced, and the crawling efficiency of the web crawlers is improved.
In one example, a web crawler may crawl business information at specified times, which may be specified points in time or specified time intervals.
When the information source address comprises business information of different time points, the web crawler can determine the time point of last crawling when crawling each time, and then crawl the business information between the time point of last crawling and the current time point. Therefore, the repeated service information which is crawled before can be avoided, and the crawling efficiency is improved.
According to the embodiment of the specification, the web crawlers can only crawl the service information from each information source address in the information source address list in a targeted manner, whole-network crawling is not needed, the number of crawled service information is reduced, and therefore the subsequent processing amount for the service information is reduced. And the information source address list is determined according to the report subject, so that the association degree between the service information crawled according to the information source address list and the service information report is higher, and the information content presented by the generated service information report is more accurate.
After the business information is crawled, at 140, target business information is determined from the crawled business information.
In one example, the business information may be subjected to an information screening process that screens out a large amount of crawled business information for a target business information that has a higher degree of association with the business information report, where the screened target business information is used to generate the business information report, so that the information content in the generated business information report is more compact and accurate.
In one mode of the information screening process, the keyword can be used to conduct the information screening process on the crawled business information. The keywords may include forward filtering keywords and/or reverse filtering keywords, where the forward filtering keywords are words with higher relevance to the reporting topic of the service information report, for example, the forward filtering keywords may include high-frequency vocabulary, professional vocabulary, and the like in the service range corresponding to the reporting topic. For example, where the reporting topic is a stall economy, the forward screening keywords may include "stall" and "economy".
The reverse screening keywords are keywords that are opposite to the reporting topic of the business information report, are irrelevant but confusable, or are specified to be excluded by the business information report.
The forward screening keywords are used to directly match the target business information, i.e., the business information that matches the forward screening keywords can be determined as the target business information. The reverse screening keywords are used to exclude non-target traffic information from the crawled traffic information, i.e., traffic information matching the reverse screening keywords may be determined as non-target traffic information, which may be directly eliminated.
In this example, the report configuration information may also include keywords and/or logical combinations between the various keywords. The keywords and/or the logical combinations between the keywords are used to perform information screening processing.
The keywords used in the information screening process are independent of each other, and the information screening process is performed by using the keywords. For example, the forward screening keywords include "stall" and "economy", and the keywords "stall" may be used to perform one information screening process, and the keywords "economy" may be used to perform another information screening process.
The logical combination of the keywords used in the information screening process is combined by logical relationships such as sum, or exclusion among the keywords.
For example, a reporting topic of a business information report is deposit business violations, and forward screening keywords associated with the reporting topic may include: deposit, deposit receipt, account, identity, real name, management, chemical name opening, virtual opening, etc., the reverse screening keywords may include postal deposit, etc., and the logical combination of the keywords thus obtained may be: ((deposit/savings/deposit/account) - (postal savings/postal store) + (identity/real name/management))/pseudonym open/virtual open, wherein "/" in the logical combination represents or, "+" represents sum, "-" represents exclusion.
In another example, the service information can be further subjected to information duplication removal, wherein the information duplication removal is to remove the service information with the same or similar semantics, and redundant service information can be removed through the information duplication removal, so as to achieve the purpose of simplifying the service information.
One way of the information deduplication process is to perform semantic analysis on each business information to obtain the abstract of each business information, and then perform the deduplication process on the business information with the same or similar semantics expressed by the abstract.
The information deduplication process may be to perform deduplication only in the currently crawled business information, perform semantic comparison between the crawled business information, and perform deduplication on business information with the same or similar semantics. The information deduplication process can also be combined with the historical service information to deduplicate the currently acquired service information, at this time, each currently acquired service information is not only subjected to the information deduplication process, but also subjected to semantic comparison with each historical service information, and if the currently acquired service information is the same as or similar to the historical service information, the currently acquired service information can be subjected to the deduplication process.
In another example, the traffic information may also be information-ranked. The ranking rule of the information ranking process may specify that the ranking rule is determined based on at least one of the dimensions of association with the report topic, information dissemination, authority of the information distribution medium, and information popularity. For example, the higher the correlation of business information with the reporting topic, the higher the information dissemination, the more authoritative the information distribution medium and the higher the information popularity, the more the ordering of the business information is. Wherein, the information propagation degree can include the forwarding times, the reference times and the like of the service information, and the information heat degree can include the praise number, the comment number and the like of the service information.
After the sorting result for the service information is obtained, the target service information can be determined according to the sorting result of the service information. For example, the first N pieces of service information in the ranking may be determined as the target service information. In this specification, N may be a specified integer.
The sorting result obtained by the information sorting processing can show the information value of each business information in the business information report, the information value of the business information with the higher sorting is higher, the business information with the higher sorting is easier to be determined as the target business information, and the more business information with high information value in the generated business information report is generated, so that the information value shown by the business information report is higher.
It should be noted that at least one of the above-mentioned information screening process, information deduplication process, and information sorting process may be performed on the crawled business information. Taking fig. 2 as an example, fig. 2 shows a flowchart of an example 200 of a processing procedure for business information according to an embodiment of the present specification. As shown in fig. 2, after the web crawler crawls the service information (210), the crawled service information is subjected to an information filtering process (220), then subjected to an information de-duplication process (230), then subjected to an information sorting process (240), and then the target service information is determined based on the sorting result of the information sorting process (250).
In one example, in addition to determining the target business information based on dimensions such as relevance to the reporting topic, information dissemination, information distribution media, and information popularity, the target business information may be determined based at least in part on the relevance between the business information. That is, the degree of association between the business information is taken as one of a plurality of reference dimensions for determining the target business information.
The association between the business information can be determined by natural language processing (NLP, natural Language Processing), and the association between the two business information can be determined according to the matched keywords. Specifically, the matching condition of each service information and the specified keyword can be determined, and if the more the same keywords matched with the two service information, the higher the association degree between the two service information is.
In this example, the service information may be ranked according to the dimensions of the association degree with the report topic, the information dissemination degree, the information distribution medium, the information popularity, and the like, and the ranking result may be obtained. Then, reordering is performed based on the first N pieces of service information in the ordering result and the degree of association between the respective pieces of service information in the ordering result. For example, when N is 1, the reordering is performed based on the first service information of the ordering and the association degree between the service information.
In one example, for other business information than the first N business information, the relevance of each other business information to the first N business information is calculated, and then the other business information is reordered in order of the relevance from high to low.
In another example, the association degree between each other service information and the first N service information may be calculated first, the service information with the highest current association degree is ranked in n+1, then the association degree between each service information to be ranked and the first n+1 service information is calculated continuously, and then the service information with the highest current association degree is ranked in n+2, and so on until the ranking is completed.
In the above example, when calculating the degree of association of one business information with a plurality of business information, the degree of association may be taken as an overall degree of association. The individual association degree of the service information and each service information in the plurality of service information can be calculated first, and then the sum of all the individual association degrees is used as the overall association degree of the service information and the plurality of service information.
By way of the above example, the higher the information value that can be exhibited by the service information that is ranked earlier in the ranking result, whereby the first N pieces of service information can be determined as target service information first, then the degree of association between the other respective service information and the service information as reference is calculated based on these previously determined service information, and the ranking is performed based on the calculated degree of association. The sorting result of the service information after the re-sorting contains the characteristic of the association degree, the association degree between the target service information determined according to the re-sorting result is higher, each target service information with the high association degree has continuity, the displayed service information has stronger logic, and the service information report generated based on the target service information with stronger logic also has higher information value.
Returning to FIG. 1, after determining the target business information, a business information report may be generated based on the target business information and the report template at 150.
In one example, the respective target business information may be directly populated into a reporting template to generate a business information report. In another example, headlines, summaries, and web links for each targeted business information may be extracted and then filled into a report template to generate a business information report. The title, abstract and web page link of the corresponding business information are presented in the business information report, the title and abstract can be used for profiling the business information in a short mode, and for a user looking up the business information report, the short title and abstract can be checked, so that the looking up time is saved. When the user needs to further know the business information, the user can check the complete business information content through the webpage link. Thus, not only the space of the business information report is reduced, but also the efficiency of the user to consult the business information report is improved.
In one example, the report template may include at least two business tiles, each of the at least two business tiles for a different one of the report topics.
Reporting subjects are for business information reports, and business subjects are for business blocks in business information reports. One reporting topic may include a plurality of different business topics, with the business scope of the reporting topic including the business scope of each business topic.
The individual business topics for a reporting topic may be branching topics to the reporting topic, or may be elaborating reporting topics from different perspectives, one for each business topic. For example, a reporting topic is a stall economy, which can include two business topics: legislative topics and market topics, the legislative topics are used for describing the stall economy from the perspective of legislative business, and the market topics are used for describing the stall economy from the perspective of market business.
The information source address list corresponding to each business block can be the same or different. When the information source address lists corresponding to the service sections are different, the information source address list of each service section can be determined according to the service theme of each service section. For example, if the service topic of a service block is an legal topic, the information source address list of the service block is determined according to the legal topic, and each information source address in the information source address list may be a website address related to law.
For each business section, in addition to calculating the association degree between each business information in the business section, the association degree between each business information in the business section and other business sections can also be calculated.
Specifically, the degree of association between each business information in the business section and each business information in other business sections may be calculated, and then the sum of the degree of association between the business information and each business information is determined as the degree of association between the business information and the other business sections. According to the method, the association degree between each business information in the business layout and other business layouts can be calculated. Wherein when the other service sections include at least two, the sum of the association degrees with the respective other service sections can be determined as the association degree of the service information with all other service sections.
Then, the target business information for each business tile may be determined from the crawled business information based at least in part on the degree of association between the business information and the business information for other business tiles.
For each business information, the sum of the association between the business information and other business information in the same business section and the association with other business sections can be determined as the total association of the business information. And then sorting based on the total association degree corresponding to each service information, and determining the target service information of each service layout according to the sorting.
In one example, the degree of association between the business information and other business information in the same business tile has a first weight, the degree of association of the business information and other business tiles has a second weight, and the first weight and the second weight may be different. When the total association degree of each business information is calculated, the association degree between the business information and other business information is multiplied by a first weight, the association degree between the business information and other business blocks is multiplied by a second weight, and then the sum of the two multiplication values is determined as the total association degree of the business information.
In one example, after determining the target business information in each business section, a presentation order of each business section in the report template may be determined based on the target business information for each business section.
The association degree between any two business blocks can be calculated according to the target business information in the two business blocks. Specifically, the association degree between each target business information in one business section and each target business information in another business section is calculated, and then the sum of the association degrees between the target business information and each target business information is determined as the association degree between the target business information and the other business section. According to the method, the degree of association between each target business information in the business section and the other business section can be calculated. And then determining the sum of the association degree between each target business information in the business layout and the other business layout as the association degree between the two business layouts.
In this example, the business layout with the first presentation order in the report template may be determined first, where the business layout with the first presentation order may be specified, may be determined according to a logical relationship between business topics of each business layout, or may be determined according to a first degree of association between each business layout and the report topic. Specifically, for each business section, the association degree between each target business information in the business section and the reporting subject is calculated, and then the sum of the association degrees corresponding to the target business information is determined as the first association degree of the business section. And determining the business layout with the maximum first association degree as the business layout with the first presentation sequence.
And then, calculating a second association degree between each business layout and other business layout. In one example, for each business tile, a second degree of association between the business tile and the business tile of the first order may be calculated. Specifically, the association degree between each target business information in the business layout and the business layout with the first sequence can be calculated, and then the sum of the association degrees corresponding to each target business information is determined as the second association degree between the business layout and the business layout with the first sequence. And ordering other business sections according to the second association degree of the business sections. The higher the degree of association between the service sections with the first order, the more forward the service section ordering, and the lower the degree of association between the service sections with the first order, the more backward the service section ordering.
In another example, a second degree of association between each business section and the business section of the first order may be calculated first, and the business section with the highest current second degree of association is determined to be the business section of the second order. Then, a second association degree between other service sections and the currently ordered service sections (the currently ordered first service section and the second service section) is calculated, specifically, the sum of the second association degrees between each service section and each currently ordered service section is determined to be the current second association degree corresponding to each service section, and then the service section with the highest current second association degree is determined to be the service section with the third sequence. And the like until all the business blocks are ordered.
After determining the presentation sequence of each business section in the report template, a business information report can be generated according to the target business information of each business section, the report template and the presentation sequence of each business section in the report template.
The method comprises the steps of adjusting the position of each business block in a report template according to the presentation sequence of each business block in the report template, determining the sequence of target business information in each business block according to the target business information of each business block, and filling the ordered sequence of the target business information of each block into the corresponding business block in the report template, so as to generate a business information report.
Through the above example, each section in the report template can be adjusted according to the target business information, and the adjustment is performed based on the association degree between the target business information and the association between the business sections, so that the information content presented by the adjusted business information report is more coherent and has stronger logic.
FIG. 3 illustrates a schematic diagram of one example 300 of generating a business information report based on a report template in accordance with an embodiment of the present disclosure.
As shown in FIG. 3, the report template includes business segments A and B, wherein business segment A includes target business information A-1 and target business information A-2, and business segment B includes target business information B-1, target business information B-2 and target business information B-3. For the target business information A-1 and A-2 in the business block A, according to the association degree of each target business information and the business theme of the business block A and the association degree of the business block B, the association degree corresponding to the target business information A-2 can be determined to be higher than the association degree corresponding to the target business information A-1.
For the target business information B-1, B-2 and B-3 in the business block B, according to the association degree of each target business information and the business theme of the business block B and the association degree of each target business information and the business block A, the highest target business information B-1 and the lowest association degree corresponding to the target business information B-2 can be determined, wherein the target business information B-3 times is the highest.
Then, a sum of the first degree of association between the target business information B-1, B-2, and B-3 in business section B and the reporting topic, and a second degree of association between business section B and business section A may be determined, and a sum of the first degree of association between the target business information A-1 and A-2 in business section A and the reporting topic may also be determined. And if the sum of the first association degrees corresponding to the service layout B is larger than the sum of the first association degrees corresponding to the service layout A, and the second association degrees corresponding to the service layout B and the service layout A are the same, the service layout B can be determined to be arranged in the first and the service layout A can be determined to be arranged in the second. The service information report generated according to the target service information of each service section, the report template and the presentation sequence of each service section in the report template is shown in the right diagram in fig. 3.
Fig. 4 shows a block diagram of an example of a business information report generating apparatus 400 of the embodiment of the present specification.
As shown in fig. 4, the service information report generating apparatus 400 may include a request receiving unit 410, a configuration information determining unit 420, an information crawling unit 430, a target service information determining unit 440, and a report generating unit 450.
The request receiving unit 410 is configured to receive a generation request of a service information report, the generation request including a report topic of the service information report. The operations performed by the request receiving unit 410 may refer to the operations of block 110 described above with reference to fig. 1.
The configuration information determining unit 420 is configured to determine report configuration information of the service information report according to a generation request of the service information report, the report configuration information including a report template and an information source address list. The operations performed by the configuration information determining unit 420 may refer to the operations of the block 120 described above with reference to fig. 1.
The information crawling unit 430 is configured to crawl business information according to the list of information source addresses using web crawlers. The operations performed by the information crawling unit 430 may refer to the operations of the block 130 described above with reference to fig. 1.
The target traffic information determining unit 440 is configured to determine target traffic information from the crawled traffic information. The operations performed by the target service information determining unit 440 may refer to the operations of block 140 described above with reference to fig. 1. In one example, the target business information determination unit 440 may be further configured to: sorting the crawled business information; and determining the target business information according to the sorting result of the business information.
The report generating unit 450 is configured to generate a service information report according to the target service information and the report template. The operations performed by the report generating unit 450 may refer to the operations of block 150 described above with reference to fig. 1.
In one example, the service information report generating apparatus 400 may further include an information processing unit configured to perform an information filtering process or an information deduplication process on the crawled service information.
In one example, the target business information determination unit 440 may be further configured to: the target business information is determined from the crawled business information based at least in part on the degree of association between the business information.
In one example, the report template includes at least two business tiles, each of the at least two business tiles being for a different business topic of the report topic, and the target business information determination unit 440 may be further configured to: the target business information for each business section is determined from the crawled business information based at least in part on the degree of association between the business information and the business information for other business sections.
In one example, the business information report generating apparatus 400 may further include a layout order determining unit configured to determine a presentation order of each business layout in the report template according to the target business information of each business layout; and the report generating unit may be further configured to: generating a service information report according to the target service information of each service block, the report template and the presentation sequence of each service block in the report template.
In one example, the layout order determination unit may be further configured to: according to the target business information of each business block, determining a first association degree between each business block and a report theme and a second association degree between each business block and other business blocks; and determining the presentation sequence of each business section in the report template according to the first association degree and the second association degree of each business section.
Embodiments of a method and apparatus for generating a traffic information report according to embodiments of the present specification are described above with reference to fig. 1 to 4.
The apparatus for generating a service information report according to the embodiments of the present disclosure may be implemented in hardware, or may be implemented in software, or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a memory into a memory by a processor of a device where the device is located. In the embodiment of the present specification, the means for generating a service information report may be implemented using an electronic device, for example.
Fig. 5 shows a block diagram of an electronic device 500 implementing a method for generating a business information report according to an embodiment of the present description.
As shown in fig. 5, the electronic device 500 may include at least one processor 510, a memory (e.g., a non-volatile memory) 520, a memory 530, and a communication interface 540, and the at least one processor 510, the memory 520, the memory 530, and the communication interface 540 are connected together via a bus 550. The at least one processor 510 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.
In one embodiment, computer-executable instructions are stored in memory that, when executed, cause at least one processor 510 to: receiving a generation request of a business information report, wherein the generation request comprises a report subject of the business information report; determining report configuration information of the business information report according to the generation request of the business information report, wherein the report configuration information comprises a report template and an information source address list; utilizing a web crawler to crawl business information according to the information source address list; determining target business information from the crawled business information; and generating a business information report according to the target business information and the report template.
It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 510 to perform the various operations and functions described above in connection with fig. 1-4 in various embodiments of the present specification.
According to one embodiment, a program product, such as a machine-readable medium, is provided. The machine-readable medium may have instructions (i.e., elements described above implemented in software) that, when executed by a machine, cause the machine to perform the various operations and functions described above in connection with fig. 1-4 in various embodiments of the specification.
In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium may implement the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the embodiments of the present specification.
Computer program code required for operation of portions of the present description may be written in any one or more programming languages, including an object oriented programming language such as Java, scala, smalltalk, eiffel, JADE, emerald, C ++, c#, VB, NET, python and the like, a conventional programming language such as C language, visual Basic 2003, perl, COBOL 2002, PHP and ABAP, a dynamic programming language such as Python, ruby and Groovy, or other programming languages and the like. The program code may execute on the user's computer or as a stand-alone software package, or it may execute partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any form of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or the connection may be made to the cloud computing environment, or for use as a service, such as software as a service (SaaS).
Examples of readable storage media include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or cloud by a communications network.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Not all steps or units in the above-mentioned flowcharts and system configuration diagrams are necessary, and some steps or units may be omitted according to actual needs. The order of execution of the steps is not fixed and may be determined as desired. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by multiple physical entities, or may be implemented jointly by some components in multiple independent devices.
The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The alternative implementation manner of the embodiment of the present disclosure has been described in detail above with reference to the accompanying drawings, but the embodiment of the present disclosure is not limited to the specific details of the foregoing implementation manner, and various simple modifications may be made to the technical solution of the embodiment of the present disclosure within the scope of the technical concept of the embodiment of the present disclosure, and all the simple modifications belong to the protection scope of the embodiment of the present disclosure.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for generating a business information report, comprising:
receiving a generation request of the business information report, wherein the generation request comprises a report subject of the business information report;
determining report configuration information of the service information report according to the generation request of the service information report, wherein the report configuration information comprises a report template and an information source address list, the report configuration information corresponding to each service information report is stored in a report configuration information base in advance, and when the generation request of the service information report is received, the report configuration information corresponding to the service information report is obtained from the report configuration information base, wherein the report configuration information base stores the corresponding relation between a report theme and the report configuration information, or the corresponding relation between a report identifier of the service information report and the report configuration information;
utilizing a web crawler to crawl business information according to the information source address list;
determining target business information from the crawled business information; and
generating the business information report according to the target business information and the report template,
wherein determining target business information from the crawled business information comprises:
Determining target business information from the crawled business information based at least in part on the degree of association between the business information;
wherein the report template comprises at least two business sections, each business section of the at least two business sections aims at different business topics of the report topic, and
determining target business information from the crawled business information based at least in part on the degree of association between the business information comprises:
determining target business information of each business block from the crawled business information at least partially according to the degree of association between the business information and the business information of other business blocks;
wherein, still include:
determining the presentation sequence of each business block in the report template according to the target business information of each business block; and
generating the business information report according to the target business information and the report template comprises:
generating a service information report according to the target service information of each service block, the report template and the presentation sequence of each service block in the report template;
wherein determining the presentation sequence of each business section in the report template according to the target business information of each business section comprises:
According to the target business information of each business block, determining a first association degree between each business block and the report subject and a second association degree between each business block and other business blocks; and
and determining the presentation sequence of each business section in the report template according to the first association degree and the second association degree of each business section.
2. The method of claim 1, wherein determining target business information from the crawled business information comprises:
sorting the crawled business information; and
and determining the target business information according to the sorting result of the business information.
3. The method of claim 1, wherein prior to ordering the crawled business information, the method further comprises:
the information screening process or the information de-duplication process is performed on the crawled business information.
4. The method of claim 3, wherein the report configuration information further comprises keywords and/or logical combinations between the keywords,
the information screening processing for the crawled business information comprises the following steps:
and utilizing the keywords and/or the logic combination among the keywords to carry out information screening processing on the crawled business information.
5. The method of claim 1, wherein the list of information source addresses for each business segment is determined based on a business topic for each business segment.
6. An apparatus for generating a traffic information report, comprising:
a request receiving unit that receives a generation request of the service information report, the generation request including a report subject of the service information report;
a configuration information determining unit, configured to determine report configuration information of the service information report according to the generation request of the service information report, where the report configuration information includes a report template and an information source address list, where the report configuration information corresponding to each service information report is stored in a report configuration information base in advance, and when the generation request of the service information report is received, the report configuration information corresponding to the service information report is obtained from the report configuration information base, where a correspondence between a report theme and the report configuration information is stored in the report configuration information base, or a correspondence between a report identifier of the service information report and the report configuration information is stored in the report configuration information base;
an information crawling unit for crawling business information according to the information source address list by utilizing a web crawler;
A target service information determining unit that determines target service information from the crawled service information; and
a report generating unit that generates the service information report based on the target service information and the report template,
wherein the target service information determining unit:
determining target business information from the crawled business information based at least in part on the degree of association between the business information;
wherein the report template comprises at least two business sections, each business section of the at least two business sections aims at different business topics of the report topic, and
the target service information determining unit:
determining target business information of each business block from the crawled business information at least partially according to the degree of association between the business information and the business information of other business blocks;
wherein, still include:
a layout order determining unit for determining the presentation order of each business layout in the report template according to the target business information of each business layout; and
the report generation unit:
generating a service information report according to the target service information of each service block, the report template and the presentation sequence of each service block in the report template;
Wherein, the block order determining unit:
according to the target business information of each business block, determining a first association degree between each business block and the report subject and a second association degree between each business block and other business blocks; and
and determining the presentation sequence of each business section in the report template according to the first association degree and the second association degree of each business section.
7. The apparatus of claim 6, wherein the target service information determining unit:
sorting the crawled business information; and
and determining the target business information according to the sorting result of the business information.
8. The apparatus of claim 6, wherein the apparatus further comprises:
an information processing unit for performing information screening processing or information de-duplication processing on the crawled business information.
9. An electronic device, comprising:
at least one processor, and
a memory coupled to the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1-5.
10. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 1 to 5.
CN202010842237.XA 2020-08-20 2020-08-20 Method and device for generating business information report Active CN111782916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010842237.XA CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010842237.XA CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Publications (2)

Publication Number Publication Date
CN111782916A CN111782916A (en) 2020-10-16
CN111782916B true CN111782916B (en) 2024-03-22

Family

ID=72762837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010842237.XA Active CN111782916B (en) 2020-08-20 2020-08-20 Method and device for generating business information report

Country Status (1)

Country Link
CN (1) CN111782916B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579961B (en) * 2020-12-28 2023-05-30 杭州搜车数据科技有限公司 Method, device, equipment and readable storage medium for constructing Web navigation page

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018014759A1 (en) * 2016-07-18 2018-01-25 阿里巴巴集团控股有限公司 Method, device and system for presenting clustering data table
CN109669853A (en) * 2018-10-23 2019-04-23 深圳壹账通智能科技有限公司 Test report generation method and device, storage medium, electric terminal
CN109726327A (en) * 2018-12-14 2019-05-07 深圳壹账通智能科技有限公司 A kind of information-pushing method and device
CN110147541A (en) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 A kind of generation method and device of economic report
CN110619568A (en) * 2019-09-17 2019-12-27 王文斌 Risk assessment report generation method, device, equipment and storage medium
CN111125204A (en) * 2019-12-17 2020-05-08 中科鼎富(北京)科技发展有限公司 Analysis report obtaining method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120246139A1 (en) * 2010-10-21 2012-09-27 Bindu Rama Rao System and method for resume, yearbook and report generation based on webcrawling and specialized data collection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018014759A1 (en) * 2016-07-18 2018-01-25 阿里巴巴集团控股有限公司 Method, device and system for presenting clustering data table
CN109669853A (en) * 2018-10-23 2019-04-23 深圳壹账通智能科技有限公司 Test report generation method and device, storage medium, electric terminal
CN109726327A (en) * 2018-12-14 2019-05-07 深圳壹账通智能科技有限公司 A kind of information-pushing method and device
CN110147541A (en) * 2019-05-23 2019-08-20 北京神州泰岳软件股份有限公司 A kind of generation method and device of economic report
CN110619568A (en) * 2019-09-17 2019-12-27 王文斌 Risk assessment report generation method, device, equipment and storage medium
CN111125204A (en) * 2019-12-17 2020-05-08 中科鼎富(北京)科技发展有限公司 Analysis report obtaining method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111782916A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
RU2696230C2 (en) Search based on combination of user relations data
Tomlein et al. An audit of misinformation filter bubbles on YouTube: Bubble bursting and recent behavior changes
US8280888B1 (en) Method and apparatus for creation of web document titles optimized for search engines
Gil et al. Towards content trust of web resources
JP6517818B2 (en) Improving Website Traffic Optimization
CN110637316B (en) System and method for prospective object identification
EP1557770A1 (en) Building and using subwebs for focused search
KR20060045720A (en) Query to task mapping
US20150324350A1 (en) Identifying Content Relationship for Content Copied by a Content Identification Mechanism
Im et al. Linked tag: image annotation using semantic relationships between image tags
US20090319481A1 (en) Framework for aggregating information of web pages from a website
CN111417940A (en) Evidence search supporting complex answers
CN108829656B (en) Data processing method and data processing device for network information
EP2827294A1 (en) Systems and method for determining influence of entities with respect to contexts
Grčar et al. User profiling for interest-focused browsing history
Srba et al. Auditing YouTube’s recommendation algorithm for misinformation filter bubbles
CN112328857B (en) Product knowledge aggregation method and device, computer equipment and storage medium
Jayatilaka et al. Knowledge extraction for semantic web using web mining
CN111782916B (en) Method and device for generating business information report
CN106997340A (en) The generation of dictionary and the Document Classification Method and device using dictionary
CN116226494B (en) Crawler system and method for information search
CN112765966A (en) Method and device for removing duplicate of associated word, computer readable storage medium and electronic equipment
US9705972B2 (en) Managing a set of data
Zuze The crossover point between keyword rich website text and spamdexing
Vasanthakumar et al. PTMIB: Profiling top most influential blogger using content based data mining approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant