WO2018072577A1 - Text generation method and server - Google Patents

Text generation method and server Download PDF

Info

Publication number
WO2018072577A1
WO2018072577A1 PCT/CN2017/101852 CN2017101852W WO2018072577A1 WO 2018072577 A1 WO2018072577 A1 WO 2018072577A1 CN 2017101852 W CN2017101852 W CN 2017101852W WO 2018072577 A1 WO2018072577 A1 WO 2018072577A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
behavior
text
performance
performance data
Prior art date
Application number
PCT/CN2017/101852
Other languages
French (fr)
Chinese (zh)
Inventor
刘康
石卫国
蔡静
张雪娇
窦晓妍
张秋明
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201610920326.5A external-priority patent/CN107977367B/en
Priority claimed from CN201610920284.5A external-priority patent/CN107977196B/en
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018072577A1 publication Critical patent/WO2018072577A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a text generation method and a server.
  • media organizations can automatically generate reports, articles, manuscripts and other texts through certain software when editing manuscripts and publishing content on a daily basis.
  • the main form is to monitor and pre-store the text live paragraphs in the live broadcast system, grab some paragraph texts, fill the paragraphs and data through preset fixed templates, and then piece together a text such as a report.
  • the embodiment of the present invention provides a text generation method and a server, which can improve the efficiency of text generation and resource utilization of a server.
  • An embodiment of the present invention provides a text generating method, where the method is applied to a server, the server includes a processor and a memory, and the method is executed by the processor executing an instruction stored in the memory, including:
  • code data for a topic is written in a machine language, With content data under this topic;
  • the embodiment of the present invention further provides a server, including a processor and a memory, where the memory stores instructions executable by the processor, and when executing the instruction, the processor is configured to:
  • code data for a subject the code data being written in a machine language and carrying content data under the subject;
  • the embodiment of the invention further provides a computer readable storage medium storing computer readable instructions, which can cause at least one processor to execute the above method.
  • FIG. 1 is a schematic structural view of an implementation environment according to an embodiment of the present invention.
  • FIG. 2a is an exemplary flowchart of a text generating method according to an embodiment of the present invention.
  • 2b is an exemplary flowchart of a text generating method according to another embodiment of the present invention.
  • FIG. 3 is an exemplary flow chart for determining a description phrase in accordance with an embodiment of the present invention.
  • FIG. 5 is an exemplary flowchart of a text generating method according to still another embodiment of the present invention.
  • FIG. 6 is a schematic diagram of generated text according to another embodiment of the present invention.
  • Figure 7 is a schematic illustration of the displayed text in accordance with an embodiment of the present invention.
  • FIG. 8a is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 8b is a schematic structural diagram of a server according to another embodiment of the present invention.
  • FIG. 9a is a schematic structural diagram of a server according to still another embodiment of the present invention.
  • FIG. 9b is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 10 is an exemplary flowchart of a text display method according to an embodiment of the present invention.
  • FIG. 11 is an exemplary flow chart for calculating a heat factor according to an embodiment of the invention.
  • FIG. 12 is a schematic diagram of generated text according to an embodiment of the invention.
  • FIG. 13 is an exemplary flowchart of a text display method according to another embodiment of the present invention.
  • FIG. 14 is a schematic diagram showing text displayed according to an embodiment of the invention.
  • FIG. 15 is a schematic diagram showing text displayed according to another embodiment of the present invention.
  • FIG. 16 is a schematic structural diagram of a server according to another embodiment of the present invention.
  • FIG. 17 is a schematic structural diagram of a server according to still another embodiment of the present invention.
  • the existing text generation method is based on the live text report of the artificial text. It is not automatically written by the software. It must be assisted by the third-party graphic live broadcast system. In addition, the only template that can be captured is the combination template. It is also fixed. This method of grabbing and combining the paragraphs leads to a strong sense of patchwork and a mechanical comparison. Therefore, the generated report is poorly readable, and the amount of information transmitted is limited, which does not satisfy the user's need to understand the detailed information. The resource utilization of the text generation device is low.
  • FIG. 1 is a schematic structural view of an implementation environment according to an embodiment of the present invention.
  • the text presentation system 100 includes a server 110, a network 120, a terminal device 130, and a user 140.
  • the server 110 includes a processor and a memory, and the method embodiments of the present invention are executed by the processor executing instructions stored in the memory.
  • the server 110 includes a code database 111, a text material database 112, and a text generation processing unit 113.
  • a client 130-1 is installed on the terminal device 130.
  • the client 130-1 can recommend the relevant text to the user and provide a social platform for the user to interact with. After the user 140 logs in to the client 130-1, the user can browse the text of interest.
  • the client 130-1 is a Tencent news client, and the user can browse for sports news and the like after logging in.
  • the code database 111 stores code data for each topic and updates in real time; the text material database 112 stores a phrase or phrase used in generating the text or a preset text template. Wait.
  • the text generation processing unit 113 is configured to read the code data stored in the code database 111, identify the behavior presentation data, determine the description phrase, generate the text in conjunction with the corpus stored in the text material database 112, and also determine the priority of the text. .
  • the server 110 sends the generated text and the determined priority to the client 130-1 in the terminal device 130, and the client 130-1 may recommend the text to the user according to the channel and content level determined by the priority, and Provide a social platform for users to interact.
  • the server 110 can be a server, or a server cluster composed of several servers, or a cloud computing service center.
  • the network 120 can connect the server 110 and the terminal device 130 in a wireless or wired form.
  • the terminal device 130 can be a smart terminal, including a smart phone, a tablet computer, a laptop portable computer, and the like.
  • FIG. 2a is an exemplary flow chart of a text generating method in accordance with an embodiment of the present invention.
  • the method is applied to a server comprising a processor and a memory, the method being executed by the processor executing instructions stored in the memory.
  • the method can include the following steps:
  • Step 201 Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
  • the code database in the server acquires and stores code data for a topic, which is written in machine language and carries content data under the topic.
  • the code developer can write the code data to the code database in advance.
  • Machine languages such as java, PHP (hypertext preprocessor), asp.net, ruby, etc., each machine language has its own set of programming rules.
  • the theme can be a sports event.
  • the code data written includes the game details and grades of each athlete under each item in the sports event.
  • the theme can be a singing competition, and the code data written includes each singer in each link. Content details of the game, results and other content data.
  • the theme may be a remote control toy car competition, and the code data written includes content data such as running speed, running time, ranking, and the like of each toy car in each track.
  • the above code data is real-time code data, so that the text generated according to the real-time code data can be presented to the user in time, and has real-time performance.
  • Step 202 Identify behavioral performance data of at least one object from the content data.
  • This step implements a transition from machine-readable "code data” to "user behavior-readable data” readable by the client.
  • the subject matter may be various events having a competitive nature, such as a sports competition, a singing competition, a remote control robot competition, and the like.
  • the subject is the subject of a subject, including characters, animals or objects.
  • its behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior.
  • the current performance evaluation data is result data for judging the performance of an object in one of its behaviors.
  • the object is an athlete
  • the performance data of each athlete corresponds to each game details of the athlete during the game
  • the behavior includes each action of the player
  • the current performance evaluation data includes the score of each action. , judges' judgment results, total scores and award results.
  • the diving competition is a scoring competition.
  • Each athlete's behavioral performance data includes behaviors defined by multiple sports technical details: walking, running, taking off, altitude, a set of actions with a certain degree of difficulty, air Attitude, coordination, water entry, etc., and current performance evaluation data corresponding to each behavior: corresponding score or ranking.
  • the server recognizes, firstly, according to the code writing rules, the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data is set, and then the behavior of each object and the current performance evaluation are identified from the code data according to the mapping relationship. data.
  • Table 1 lists the behavioral performance data results in accordance with an embodiment of the present invention.
  • the theme is the women's double 3m springboard final of the Rio Olympic Games.
  • the server identified two athletes from the code data, Wu Minxia and Shi Tingxi, aged 31 and 24 respectively.
  • the behavior includes five rounds of double diving.
  • the current performance evaluation data includes Scores, rankings, and final total scores and medal results for each round.
  • Step 203 Generate text of the theme according to behavior performance data of the at least one object.
  • This step implements the conversion from "behavioral performance data” to "text” and gets the text under the theme.
  • FIG. 2b is an exemplary flowchart of a text generating method according to another embodiment of the present invention. As shown in FIG. 2b, based on step 201 and step 202 given in FIG. 2a, step 203 is further divided into step 203a and step 203b. specifically,
  • Step 203a Determine at least one description phrase according to behavior performance data of the at least one object.
  • This step implements the association between "behavior performance data” and "description phrases.” With Physically, there are three ways to determine a phrase:
  • Method 1 comparing behavioral performance data of the plurality of objects under the theme, and selecting at least one description phrase that matches the comparison result from the preset description phrase.
  • This method is a horizontal comparison of multiple objects for the same project under the same topic.
  • the theme of the diving competition of the Olympic Games includes a total of 8 competitions, namely: women's double 3m springboard diving, men's double 3m springboard diving, women's 3m springboard diving, men's 3m springboard diving, women's double 10m Springboard diving, men's double 10m springboard diving, women's 10m springboard diving, men's 10m springboard diving.
  • the data of multiple athletes are compared synchronously, and the descriptive phrases are restored according to the comparison result.
  • the result of the pairwise comparison of the behavioral performance data of the plurality of objects includes greater than, equal to, or less than, and the server presets a corresponding plurality of description phrases.
  • Table 2 is a description phrase preset by comparing behavioral performance data of a plurality of objects according to an embodiment of the present invention.
  • Table 2 compares the performance data of multiple objects and presets the description phrase
  • the server can determine the corresponding description phrase as "slightly backward", "not rival” and "very regrettable".
  • Method 2 For each object, obtain historical performance data of the object, compare the behavioral performance data of the object with the historical performance data, and select at least one description phrase that matches the comparison result from the preset description phrase.
  • FIG. 3 is an exemplary flow chart for determining a description phrase in accordance with an embodiment of the present invention. As shown in Figure 3, the following steps are included:
  • Step 2031 Obtain historical performance data of the object for each object.
  • Historical performance data includes athletes' past performance, world rankings, project adjustments, and so on.
  • the server may identify historical performance data of the object from the history code data; or, the server writes and saves historical performance data of each object under the corresponding topic in advance.
  • step 2032 the behavioral performance data and the historical performance data of the object are respectively compared according to a plurality of data types.
  • This step uses a method of classification comparison, which is divided into multiple data types according to the attributes of the data, for example, the location of the game, the age of the object, the action of each link, the score, the ranking, and the like.
  • Step 2033 Filter the comparison result with the display value from the plurality of comparison results of the plurality of data types.
  • the comparison result with the display value is selected from the plurality of comparison results of the plurality of data types.
  • the screening method may be to score each comparison result according to the display value, and then sort the scores, and select a plurality of comparison results with display value.
  • Step 2034 Select at least one description phrase that matches the comparison result with the display value from the preset description phrase.
  • the comparison between behavioral performance data and historical performance data can also be divided into greater than, equal to and less than, see Table 2 above, therefore, the matching can also be selected from the preset description phrases. At least one description phrase.
  • Method 3 Compare the current performance data with the performance expectation data for each object.
  • the performance expectation data of the object is obtained; the behavioral performance data of the object is compared with the performance expectation data, and at least one description phrase matching the comparison result is selected from the preset description phrases.
  • the server may identify performance expectation data of each object from the history code data; or, the server writes and saves performance expectation data of each object under the corresponding topic in advance.
  • Table 3 is a description phrase that is preset based on comparison with expected performance data in accordance with an embodiment of the present invention. For example, taking Sun Yang’s participation in the men’s 400m freestyle final as an example, due to the fall behind Holden, according to the expectations before the game, Sun Yang’s description of the silver medal was “sorry” rather than exceeded expectations.
  • Step 203b Generate text of the topic according to the behavior data of the at least one object and the at least one description phrase.
  • This step implements an extension from decentralized "behavioral performance data", "descriptive phrase” to complete "text”.
  • the specific method for generating includes: selecting a conjunction word for each description phrase in a preset corpus database; linking at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence; at least one short sentence Combine at least one paragraph and connect at least one paragraph to get the text.
  • the conjunctions play the functions of “starting”, “contracting”, “transferring” and “combining”, including the contextual transition words, the conjunctions in the tone, the logical conjunctions, and the background introduction in the historical performance data. Wait. For example, taking the results of the men's 400-meter freestyle final as an example, a number of description phrases were identified for the object "Sun Yang”: “Sorry failed to defend”, “has always been favored by the Chinese people” and “unfavorable opponent Holden” ".
  • the definition of the conjunction is the apposition "get the runner-up”; for “has always been favored by the Chinese people to win the championship”, determine the connection term as the reason for "the first Chinese swimmer to win the Olympic gold medal, but also the project The champion of the previous Olympic Games; for the "invincible opponent Holden", the definition of the conjunction is the turning word "but ultimately.”
  • FIG. 4 is a schematic diagram of generated text in accordance with an embodiment of the present invention.
  • the generated text a description of multiple objects under the theme of the men's 400-meter freestyle final is included, including athletes Holden, Sun Yang, Qiu Ziao, Gai, Dwyer, and Qiti.
  • the generated text consists of three paragraphs, each of which includes the athlete's grades, rankings, and descriptors identified by underscores. group.
  • the robot can independently realize the humanized expression of the robot report through the machine's own learning and algorithm, and the report text brought by the humanized expression technology passes the Turing test (ie if The computer can answer a series of questions raised by human testers within 5 minutes, and more than 30% of the answers make the tester mistakenly think that it is answered by humans, then the computer passes the test), and the quality of the manuscript has no difference from the manual report.
  • the conversion error rate is quite high and cannot be applied in batches, but in the above embodiment,
  • the text generation method through the code data -> behavioral performance data -> description phrase -> paragraph expansion -> text formation, compared to speech processing, the complexity of the algorithm is reduced, reducing the processing of the server side processor The burden and response time, while ensuring high-quality text, can be applied in large quantities, which can greatly improve the resource utilization of the server.
  • FIG. 5 is an exemplary flowchart of a text generating method according to another embodiment of the present invention. This method is applied to the server. As shown in FIG. 5, the method may include the following steps:
  • Step 501 Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
  • Step 502 Identify behavioral performance data of at least one object from the content data according to the mapping relationship, the behavioral performance data including one or more behaviors and current performance evaluation data corresponding to each behavior.
  • the server first sets the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data according to the writing rules of the code, and then identifies according to the mapping relationship.
  • FIG. 6 is a schematic diagram of generated text in accordance with another embodiment of the present invention.
  • Interface 600 shown in Figure 6, shows a generated text, and at block 610, the title of the text is "Olympic"
  • the promotion of the text in Box 620 is “Tencent Sports”, and the text of the text is given in Box 630, which includes all the performance data given in Table 1 above. Underlined to identify.
  • Step 503 Determine at least one description phrase according to behavior performance data of at least one object, historical performance data of each object, and performance expectation data.
  • This step combines the three methods for determining the description phrase given in step 203a, and details are not described herein again.
  • Step 504 Generate a text of the topic based on the behavior data of the at least one object and the at least one description phrase based on the corpus database.
  • the preset corpus database selects a conjunction word for each description phrase; and connects at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence; At least one short sentence is combined into at least one paragraph, and at least one paragraph is connected to obtain the text.
  • paragraph templates include abstracts, background introductions, hotspots, detailed texts, reviews, appendices, and more.
  • the text includes paragraphs including: a summary (see block 730), a hotspot (see block 740), a detailed body (see block 750), and an appendix (see block 760).
  • At least one to-be-used paragraph template included in the text is selected from a plurality of types of paragraph templates. Then, when the at least one short sentence is combined into at least one paragraph, at least one short sentence matching the to-be-used paragraph template is determined from the at least one short sentence for each of the to-be-used paragraph templates, for the determined at least one The short sentences are combined to obtain a paragraph corresponding to the paragraph template to be used.
  • Step 505 performing keyword review on the generated text.
  • the review here includes troubleshooting keywords, and manuscripts with higher risk-weighted levels can also be manually reviewed, for example, by submitting keywords to a manual review window for review.
  • step 506 the audited text is sent to the client for display.
  • Figure 7 is a schematic illustration of the displayed text in accordance with an embodiment of the present invention.
  • a report showing a sports event is recommended to the user.
  • the title is "Zhang Mengxue wins the first Olympic gold medal for the Chinese Legion”.
  • the promoter "Tencent Sports” is displayed, the date is "2016-08-07", and the launch time of the report is "22:23”.
  • a summary of the report is given in block 730, the hotspot portion "game focus” of the story is given in block 740, and the detailed text “Fantastic Playback” of the report is given in block 750, and is in the square
  • the appendix "player profile” of the report is given in block 760.
  • FIG. 8a is a schematic structural diagram of a server according to an embodiment of the invention. As shown in Figure 8a, server 804 includes:
  • the obtaining module 801 is configured to obtain code data for a topic, where the code data is written in a machine language and carries content data under the topic;
  • the identification module 802 is configured to identify performance performance data of the at least one object from the content data obtained by the obtaining module 801;
  • a generating module 803 configured to perform behavior of at least one object according to the identifying module 802 Performance data, generating text for the topic.
  • FIG. 8b is a schematic structural diagram of a server according to another embodiment of the present invention. As shown in Figure 8b, the server 800 includes:
  • the obtaining module 810 is configured to obtain code data for a topic, where the code data is written in a machine language and carries content data under the topic;
  • the identification module 820 is configured to identify performance performance data of the at least one object from the content data obtained by the obtaining module 810;
  • a determining module 830 configured to determine at least one description phrase according to the behavior performance data of the at least one object obtained by the identification module 820;
  • the generating module 840 is configured to generate text of the topic according to the behavior performance data of the at least one object obtained by the identification module 820 and the at least one description phrase determined by the determining module 830.
  • the server 800 further includes:
  • the setting module 850 is configured to set a mapping relationship between the code data and the object, the behavior, and the current performance evaluation data according to the writing rule of the code;
  • the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior
  • the identification module 820 is configured to identify the behavior of each object and the current performance evaluation data according to the mapping relationship set by the setting module 850. .
  • the determining module 830 is configured to: obtain, for each object, historical performance data of the object; compare the behavioral performance data of the object with the historical performance data, and select and compare the preset description phrases. The result matches at least one of the description phrases.
  • the determining module 830 is configured to compare the behavioral performance data and the historical performance data according to the plurality of data types, and select the comparison result with the display value from the plurality of comparison results of the plurality of data types. At least one description phrase matching the comparison result with the display value is selected in the preset description phrase.
  • the determining module 830 is configured to: obtain, for each object, performance expectation data of the object; compare the behavior performance data of the object with the performance expectation data, and select and compare the preset description phrases. The result matches at least one of the description phrases.
  • the generating module 840 is configured to: select a conjunction word for each description phrase in a preset corpus database; and connect at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence Combine at least one short sentence into at least one paragraph and connect at least one paragraph to get the text.
  • FIG. 9a is a schematic structural diagram of a server according to an embodiment of the invention.
  • the server 900a can include a processor 910a, a memory 920a, a port 930a, and a bus 940a.
  • Processor 910a and memory 920a are interconnected by a bus 940a.
  • Processor 910a can receive and transmit data over port 930a. among them,
  • the processor 910a is configured to execute a machine readable instruction module stored by the memory 920a.
  • the memory 920a stores a machine readable instruction module executable by the processor 910a.
  • the instruction module executable by the processor 910a includes an acquisition module 921a, an identification module 922a, and a generation module 923a. among them,
  • the obtaining module 921a may be executed by the processor 910a to obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
  • the behavioral performance data of the at least one object may be identified from the content data obtained by the obtaining module 921.
  • the text of the theme may be generated according to the behavior performance data of the at least one object obtained by the identifying module 922.
  • FIG. 9b is a schematic structural diagram of a server according to an embodiment of the invention.
  • the server 900 A processor 910, a memory 920, a port 930, and a bus 940 can be included.
  • Processor 910 and memory 920 are interconnected by a bus 940.
  • Processor 910 can receive and transmit data through port 930. among them,
  • the processor 910 is configured to execute a machine readable instruction module stored by the memory 920.
  • Memory 920 stores machine readable instruction modules executable by processor 910.
  • the instruction module executable by the processor 910 includes an acquisition module 921, an identification module 922, a determination module 923, and a generation module 924. among them,
  • the obtaining module 921 is executed by the processor 910 to obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
  • the behavior performance data of the at least one object may be identified from the content data obtained by the obtaining module 921.
  • the determining module 923 may be executed by the processor 910 to: determine at least one description phrase according to the behavior performance data of the at least one object obtained by the identification module 922.
  • the text of the topic may be generated according to the behavior performance data of the at least one object obtained by the identification module 922 and the at least one description phrase determined by the determining module 923.
  • the instruction module executable by the processor 910 further includes: a setting module 925. among them,
  • the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data may be set according to a writing rule of the code
  • the performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior.
  • the identification module 922 When the identification module 922 is executed by the processor 910, the identifier 922 may identify each object according to a mapping relationship set by the setting module 925. Behavior and current performance evaluation data.
  • FIG. 10 is an exemplary flowchart of a text presentation method in accordance with an embodiment of the present invention. This method is applied to the server. As shown in FIG. 10, the method may include the following steps:
  • Step 1001 Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
  • This step refers to the method described in the foregoing step 201, and details are not described herein again.
  • Step 1002 Identify behavioral performance data of at least one object from the content data.
  • the recognized behavioral performance data is multi-dimensional data, and its structure can be expressed as: ⁇ topic, object, behavior, current performance evaluation data ⁇ .
  • Table 4 lists examples of multi-dimensional behavior data in accordance with an embodiment of the present invention.
  • the theme corresponds to a sports event
  • the object corresponds to an athlete
  • the behavior corresponds to a specific action
  • the current performance evaluation data corresponds to a specific score.
  • athlete Deng Shudi chose the action of difficulty 7.2, E score 8.566, with a total score of 15.766.
  • Step 1003 Perform performance data according to behavior of at least one object and history of each object Performance data and performance expectation data, determining at least one content item to be pushed under the topic, and generating text for each content item to be pushed.
  • the server acquires historical performance data of each object and sets performance expectation data of each object.
  • a content item to be pushed refers to a content that may be pushed, for example, a news point or event with newsworthiness.
  • the method specifically includes: for each object, performing cross-comparison of the plurality of data according to the behavior performance data of the object and the historical performance data and/or the performance expected data, and determining the content corresponding to the object. Whether the item satisfies the preset pushing condition; if the pushing condition is met, determining that the content item corresponding to the object is to be pushed, and calculating a heat factor of the item to be pushed corresponding to the object.
  • each text of the content item to be pushed In generating each text of the content item to be pushed, according to an embodiment of the present invention, first, at least one description phrase matching the content item to be pushed is selected in a preset text material database (see FIG. 1). And then, the behavioral performance data, the historical performance data, and/or the performance expectation data and the at least one description phrase of the at least one object related to the content item to be pushed are combined into at least one paragraph, and at least one paragraph is connected to obtain the text.
  • a text template corresponding to each heat factor is set in advance. After determining the heat factor of the content item to be pushed, behavior performance data, historical performance data, and/or performance expectations of at least one object involved in each content item to be pushed Data and other object behavior data are embedded in the text template corresponding to the item to be pushed, thereby generating text.
  • the method illustrated in FIG. 10 further includes a step 1004. Specifically,
  • Step 1004 Determine, for each content item to be pushed, a priority for displaying the text corresponding to the content item to be pushed according to at least one preset hotness of the topic, and set the determined priority and the content item to be pushed.
  • the corresponding text is sent to the client so that the client displays the text according to priority.
  • the push heat may include any of the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/technology involved in the topic, the degree of change of the environment/site involved in the topic, or Any number of items.
  • the theme is one of the many events in the Olympic Games.
  • the popularity of the theme refers to whether the event has sufficient attention.
  • the men’s 400m freestyle final is a popular game of interest; in this event, there are athletes and grandchildren.
  • Yang then the popularity of the object refers to Sun Yang as a popular champion;
  • the maturity of the device/object involved in the theme refers to the technical level of the equipment used in the event or the level of the object, for example, the function of the swimsuit used in the swimming competition.
  • the degree of environmental/site change involved in the theme refers to the environment in which the event takes place or the possible sudden situation of the venue, for example, sudden heavy rain in the equestrian competition
  • the venue became muddy, affecting the movements of the horses and the performance of the athletes; or, during the marathon, it suddenly rained heavily and the road was slippery, affecting the athletes' movements and achievements.
  • the determined priorities correspond to different text display styles, and different priorities correspond to different push channels and content levels.
  • the server will root The push channel and content level of the text are displayed according to the determined priority settings.
  • the push channel includes real-time push, mobile display, website display, etc.
  • the content level is divided into headline level, news level, and general level.
  • the client displays the received text according to the set content level through the corresponding push channel according to the received priority indication.
  • At least one content item to be pushed under the theme is determined according to behavior performance data of at least one object and historical performance data and/or performance expectation data of each object, and text of each content item to be pushed is generated. And then, for each content item to be pushed, determining, according to at least one preset hotness of the theme, a priority of displaying the text corresponding to the content item to be pushed, and determining the determined priority and the content item to be pushed The corresponding text is sent to the client, so that the client displays the text according to the priority, and provides a method for discovering news points based on data transaction, which can distinguish the difference in importance of different content items in the news angle, and truly realize
  • robots can quickly report multiple events, and can simultaneously produce manuscripts in batches, so that the contents of the events that are hot and hot are not leaked, saving a lot of manpower.
  • multiple reports that have been sorted out and focused are pushed according to different push channels
  • step 1003 when determining at least one content item to be pushed, multi-dimensional cross comparison is performed according to the behavior performance data, historical performance data, and performance expectation data of the object, and it is determined whether the preset pushing condition is satisfied, and the corresponding object is calculated.
  • the calculated heat factor can be positive or negative. If the heat factor takes a positive value, it means that the heat factor corresponds to the hot news point; if the heat factor takes a negative value, it means that the heat factor corresponds to the hot news point.
  • the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior
  • the historical performance data includes historical performance evaluation data corresponding to each behavior
  • the performance expectation data includes corresponding to each behavior.
  • Expected performance evaluation data. If a topic includes 1 object, the i-th object corresponds to J i behaviors, j 1...J i , and the current performance evaluation data corresponding to the j-th behavior is represented as s i,j .
  • the historical performance evaluation data corresponding to the jth behavior in the historical performance data of the object is expressed as s' i,j , and the expected performance evaluation data corresponding to the jth behavior in the performance expectation data is expressed as s" i,j .
  • the behavioral performance data of at least one object and the historical performance data of each object are compared to calculate a heat factor of one or more objects.
  • the first heat threshold of each behavior is preset, so that s′ th, j is the first heat threshold of the j-th behavior preset for the historical performance data.
  • the first candidate heat factor of the jth behavior of the i-th object is:
  • the weight of each behavior when calculating the heat factor is set in advance, so that ⁇ i,j represents the weight of the jth behavior of the i-th object, and is used to represent the j-th behavior of the i-th object from the perspective of the news hotspot.
  • the degree of importance For example, in a swimming competition, the top players who win the championship have higher weights, and the weight of the last race is higher.
  • the total number of the first heat factors is less than or equal to 1.
  • the behavioral performance data of at least one object and the performance expectation data of each object are compared to calculate a heat factor of one or more objects.
  • a second heat threshold of each behavior is preset, so that s′′ th, j is a second heat threshold for the j-th behavior preset by the performance expected data.
  • the second candidate heat factor of the jth behavior of the i-th object is:
  • the weight of each behavior when calculating the heat factor is set in advance, and as above, let ⁇ i,j represent the weight of the j-th behavior of the i-th object.
  • the second heat factor of the i-th object is calculated as:
  • the total number of the second heat factors is less than or equal to 1.
  • behavioral performance data of at least one object and historical performance data of each object are compared with performance expectation data to calculate a heat factor of one or more objects.
  • FIG. 11 is an exemplary flow chart for calculating a heat factor in accordance with an embodiment of the present invention. As shown in FIG. 11, the following steps are included:
  • Step 1101 a first weight of each of the historical performance data and the performance desired data when calculating the heat factor is set in advance, and a second weight of each behavior when calculating the heat factor of an object is preset.
  • ⁇ ' and ⁇ ′′ denote the respective weights of the historical performance data and the representation desired data when calculating the heat factor.
  • ⁇ i,j represent the weight of the j-th behavior of the i-th object.
  • Step 1104 Compare, for each behavior of the object, current performance evaluation data corresponding to the behavior with historical performance evaluation data, that is, calculate s i,j -s' i,j .
  • Step 1105 Compare, for each behavior of the object, current performance evaluation data corresponding to the behavior with expected performance evaluation data, that is, calculate s i,j -s′′ i,j .
  • Step 1108 the comparison result is taken as the first candidate heat factor h' i,j .
  • Step 1109 the comparison result is taken as a second candidate heat factor h" i,j .
  • Step 1110 Perform weighted summation of all first candidate heat factors and second candidate heat factors of the object according to the first weight and the second weight to obtain a heat factor of the object.
  • the total heat factor of the i-th object is calculated as:
  • the priority of displaying the text corresponding to the content item to be pushed is determined according to at least one preset hotness of the theme, and specifically includes the following processing:
  • the push heat includes the following four items: the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/technology involved in the topic, and the environment/site in which the topic relates.
  • the server predetermines the value corresponding to each push heat under the topic, and then takes out the maximum value, that is, determines the hottest push heat item to correct the heat factor, thereby obtaining the priority score.
  • the priority of the item to be pushed may not be high. For example, for an unpopular UFO game, even if a sport receives a medal, it achieves a zero breakthrough in the medal. Since the event has a low degree of attention, the priority of pushing the content item is still low.
  • Table 5 shows an example of the push heat value according to an embodiment of the present invention. It can be seen that for the men's 400m freestyle final, the maximum score of 25 corresponds to the popularity of Sun Yang.
  • Table 6 is an example of a priority value interval according to an embodiment of the present invention. Among them, the priority is divided into three sections, and the division between the sections is realized by setting the value P max . Each priority corresponds to a push channel and content level. For example, a content item to be pushed with a priority of 1 can use a real-time push channel and display it according to the headline level.
  • Figure 12 is a schematic illustration of generated text in accordance with an embodiment of the present invention.
  • the text belongs to the "unpopular hot news" news point, as shown in FIG. 12, the "# ⁇ ” is added to the title “Taekwondo men's 58 kg class finals Zhao Shuai Chuang history wins” shown in block 1210.
  • the flag of Hotspot #" is displayed, and the result of the priority decision is displayed in block 1260, "#Priority Decision #Real Time Push, Header Level".
  • the generation time, summary, event focus, and highlight playback passages of the text are displayed in blocks 1220 ⁇ 1250, respectively.
  • FIG. 13 is an exemplary flowchart of a text display method in accordance with another embodiment of the present invention. This method is applied to the server. As shown in FIG. 13, the method may include the following steps:
  • Step 1301 Obtain performance performance data of at least one object for a topic.
  • Step 1302 Determine at least one content item to be pushed under the topic according to behavior performance data of at least one object and historical performance data and performance expectation data of each object, and generate text of each content item to be pushed.
  • step 1303 for each content item to be pushed, the priority of displaying the text corresponding to the content item to be pushed is determined according to at least one preset hotness of the topic.
  • the server can also set the correspondence between the different priorities and the push channel and the content level according to the strong degree of the information demand of the different user groups.
  • the client will report the allocation method specified by the user to the server. For example, a user is a sports fan, and for the priority 1-3 shown in Table 6 above, the push channel is set to be pushed in real time.
  • Step 1304 Perform keyword review on the generated text, and send the determined priority and the audited text to the client, so that the client displays the text according to the priority.
  • the review here includes troubleshooting keywords, and manuscripts with a higher risk-weighted level can also be submitted to the manual review window for review.
  • the client displays the received text according to the set content level through the corresponding push channel according to the received priority indication.
  • the client will set the corresponding display bit.
  • the client will set a display position in the headline position of the recommended display interface, and the display bit can display only part of the text in a linked manner, and the complete text can be after the user clicks on the link. Shown on another page.
  • FIG. 14 is a schematic diagram showing text displayed in accordance with an embodiment of the present invention.
  • the client pushes a "hot upset” sports event report to the user in real time according to the headline level.
  • the box 1410 displays the "headline news" interface identifier, and the title is given in block 1420.
  • the promotion is "Tencent Sports", date "2016- 08-17”
  • the real-time push time of the report is "01:28”
  • block 1440 part of the body text and a link to the full text " more> " are given.
  • FIG. 15 is a schematic diagram showing text displayed in accordance with another embodiment of the present invention.
  • the full text of the report is displayed in the display interface 1500.
  • a "Comment” option (see 1521) and a "Share” option (see 1522) are also provided in block 1520 for the user to interact on the social platform.
  • a picture of the upset news is shown in block 1530, and the title of the picture is given in block 1540 to highlight the subject of the upset news. Additionally, the detailed body of the story is given in block 1550, and the appendix "player profile” of the story is given in block 1560.
  • the robot can excavate a hot news hot spot that is hot or hot, and the draft speed is fast.
  • a hot news hot spot that is hot or hot
  • the draft speed is fast.
  • each game from the end to the data acquisition and text generation, and then to the war report link issued an average of 30 seconds faster than the manual news, at least 5-10 minutes faster than the detailed report.
  • the manuscript can be produced in batches, and hundreds of events can be monitored at the same time, so that there is no leakage of the unpopular events.
  • TV broadcasts often have limited broadcast channels. Due to the obvious gap between the ratings, only the individual events can be intercepted, and many of the events do not have national players, or the domestic players do not have the strength to enter the finals. . But in fact, this kind of event will still have users' attention. Due to the limited energy and conditions of manual editing, it is impossible to report in time, and the robot can report in real time and quickly as long as the data of the event is obtained.
  • the robot can replace all the work in each link, get the core data accurately from the database, automatically write the manuscript, generate the link front-end display, push one step in place, only a small number of managers can be needed.
  • FIG. 16 is a schematic structural diagram of a server according to another embodiment of the present invention. As shown in FIG. 16, the server 1600 includes:
  • the obtaining module 1610 is configured to obtain code data for a topic, the code data is written in a machine language, carrying content data under the topic, and the behavior performance data of the at least one object is identified from the content data;
  • the first determining module 1620 is configured to determine, according to the behavior performance data of the at least one object acquired by the obtaining module 1610 and the historical performance data and the performance expectation data of each object, at least one content item to be pushed under the theme;
  • a generating module 1630 configured to generate text of each content item to be pushed determined by the first determining module 1620;
  • the second determining module 1640 is configured to determine, for each of the to-be-pushed content items determined by the first determining module 1620, a priority of displaying the text corresponding to the content item to be pushed according to at least one preset hotness of the topic; and ,
  • the sending module 1650 is configured to send the priority determined by the second determining module 1640 and the text corresponding to the to-be-pushed content item generated by the generating module 1630 to the client, so that the client displays the text according to the priority.
  • the first determining module 1620 includes:
  • the determining unit 1621 is configured to determine, for each object of the at least one object, whether the content item corresponding to the object satisfies a pushing condition according to the behavior performance data of the object and the historical performance data and the performance expectation data;
  • the calculating unit 1622 is configured to: if the determining unit 1621 determines that the pushing condition is met, determine that the content item corresponding to the object is to be pushed, and calculate the to-be-pushed corresponding to the object. The heat factor of the tolerance.
  • the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior
  • the historical performance data includes historical performance evaluation data corresponding to each behavior
  • the determining unit 1621 is configured to preset a first heat threshold for each behavior for the historical performance data, and compare the current performance evaluation data corresponding to the behavior with the historical performance evaluation data for each behavior of the object, if If the absolute value of the comparison result is greater than the first heat threshold of the behavior, the comparison result is used as the first candidate heat factor;
  • the calculating unit 1622 is configured to preset a weight of each behavior when calculating the heat factor; and weighting and summing all the first candidate heat factors of the object according to the weight to obtain the heat factor.
  • the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior
  • the performance expectation data includes expected performance evaluation data corresponding to each behavior
  • the determining unit 1621 is configured to preset a second heat threshold for each behavior for the performance expectation data, and compare the current performance evaluation data corresponding to the behavior with the expected performance evaluation data for each behavior of the object, if If the absolute value of the comparison result is greater than the second heat threshold of the behavior, the comparison result is used as the second candidate heat factor;
  • the calculating unit 1622 is configured to preset a weight of each behavior when calculating the heat factor; and weighting and summing all the second candidate heat factors of the object according to the weight to obtain the heat factor.
  • the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, the historical performance data including historical performance evaluation data corresponding to each behavior, and performance expectation data including and Expected performance evaluation data corresponding to each behavior,
  • the determining unit 1621 is configured to preset a first heat threshold for each behavior for the historical performance data; and set a second heat threshold for each behavior for the performance expectation data; for each behavior of the object, perform the following processing: Comparing the current performance evaluation data corresponding to the behavior with the historical performance evaluation data. If the absolute value of the comparison result is greater than the first heat threshold of the behavior, the comparison result is used as the first candidate heat factor; Comparing the current performance evaluation data with the expected performance evaluation data, and if the absolute value of the comparison result is greater than the second heat threshold of the behavior, the comparison result is used as the second candidate heat factor;
  • the calculating unit 1622 is configured to preset a first weight of each of the historical performance data and the performance expected data when calculating the heat factor, and preset a second weight of each behavior when calculating the heat factor of the object; according to the first weight sum The second weight weights all first candidate heat factors and second candidate heat factors of the object to obtain a heat factor of the object.
  • the push heat includes any of the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/object involved in the topic, and the degree of change of the environment/site involved in the topic. Item or any number of items,
  • the second determining module 1640 is configured to determine a value corresponding to each push heat; multiply the heat factor of the content item to be pushed by a maximum value in the value to determine a priority score of the hot factor; and score according to the priority And comparing the preset multiple value intervals to determine the priority of the text corresponding to the content item to be pushed.
  • the second determining module 1640 is configured to display a push channel and a content level of the text according to the determined priority level, so that the client displays the text according to the set content level by using a push channel.
  • FIG. 17 is a schematic structural diagram of a server according to still another embodiment of the present invention.
  • the server 1700 can include a processor 1710, a memory 1720, a port 1730, and a bus 1740.
  • the processor 1710 and the memory 1720 are interconnected by a bus 1740.
  • the processor 1710 can receive and transmit data through the port 1730. among them,
  • the processor 1710 is configured to execute a machine readable instruction module stored by the memory 1720.
  • the memory 1720 stores machine readable instruction modules executable by the processor 1710.
  • the instruction module executable by the processor 1710 includes an acquisition module 1721, a first determination module 1722, a generation module 1723, a second determination module 1724, and a transmission module 1725. among them,
  • the obtaining module 1721 when executed by the processor 1710, may be: acquiring code data for a topic, the code data is written in a machine language, carrying content data under the topic; and identifying at least one object from the content data Behavioral performance data;
  • the at least one to be pushed under the theme may be determined according to the behavior performance data of the at least one object acquired by the obtaining module 1721 and the historical performance data and the performance expectation data of each object.
  • the generating module 1723 may be executed by the processor 1710 to: generate text of each content item to be pushed determined by the first determining module 1722;
  • the content of the to-be-push content item determined by the first determining module 1722 is determined according to at least one preset hotness of the topic.
  • the sending module 1725 When the sending module 1725 is executed by the processor 1710, the priority determined by the second determining module 1724 and the text corresponding to the to-be-pushed content item generated by the generating module 1723 are sent to the client, so that the client displays according to the priority.
  • the text The text.
  • each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • each of the embodiments of the present invention can be implemented by a data processing program executed by a data processing device such as a computer.
  • the data processing program constitutes the present invention.
  • a data processing program usually stored in a storage medium is executed by directly reading a program out of a storage medium or by installing or copying the program to a storage device (such as a hard disk and or a memory) of the data processing device. Therefore, such a storage medium also constitutes the present invention.
  • the storage medium can use any type of recording method, such as paper storage medium (such as paper tape, etc.), magnetic storage medium (such as floppy disk, hard disk, flash memory, etc.), optical storage medium (such as CD-ROM, etc.), magneto-optical storage medium (such as MO, etc.).
  • paper storage medium such as paper tape, etc.
  • magnetic storage medium such as floppy disk, hard disk, flash memory, etc.
  • optical storage medium such as CD-ROM, etc.
  • magneto-optical storage medium Such as MO, etc.
  • embodiments of the present invention also disclose a storage medium in which is stored a data processing program for performing any of the above-described embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A text generation method and a server. The method is applied to the server. The server comprises a processor (910, 910a) and a memory (920, 920a). The method is executed by executing instructions stored in the memory (920, 920a) by the processor (910, 910a), and comprises: obtaining code data (201, 501, 1001) for a topic, the code data being written according to machine language and carrying content data of the topic; identifying behavioral performance data (202, 1002) of at least one object from the content data; and generating a text (203) of the topic according to the behavioral performance data of the at least one object.

Description

一种文本生成方法及服务器Text generation method and server
本申请要求于2016年10月21日提交中国专利局、申请号为201610920284.5、申请名称为“一种文本生成方法及服务器”的中国专利申请的优先权。同时,本申请要求于2016年10月21日提交中国专利局、申请号为201610920326.5、申请名称为“一种文本展示方法及服务器”的中国专利申请的优先权。其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application No. 201610920284.5, filed on Oct. 21, 2016, and filed on Jan. At the same time, this application claims the priority of the Chinese patent application filed on October 21, 2016, the Chinese Patent Application No. 201610920326.5, the application name is "a text display method and a server". The entire contents of this application are incorporated herein by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种文本生成方法及服务器。The present application relates to the field of data processing technologies, and in particular, to a text generation method and a server.
发明背景Background of the invention
目前,媒体机构在日常编辑稿件、发布内容时,通过某些软件可以自动生成报道、文章、稿件等文本。采用的形式主要是,对图文直播***内的文字直播段落进行监控和预存数据,从中抓取部分段落文字,通过预设的固定模板进行段落和数据的填充,从而拼凑成一篇报道等文本。At present, media organizations can automatically generate reports, articles, manuscripts and other texts through certain software when editing manuscripts and publishing content on a daily basis. The main form is to monitor and pre-store the text live paragraphs in the live broadcast system, grab some paragraph texts, fill the paragraphs and data through preset fixed templates, and then piece together a text such as a report.
发明内容Summary of the invention
有鉴于此,本发明实施例提供了一种文本生成方法及服务器,能够提高文本生成的效率以及服务器的资源利用率。In view of this, the embodiment of the present invention provides a text generation method and a server, which can improve the efficiency of text generation and resource utilization of a server.
本发明实施例的技术方案是这样实现的:The technical solution of the embodiment of the present invention is implemented as follows:
本发明实施例提供了一种文本生成方法,所述方法应用于服务器,所述服务器包括处理器和存储器,所述方法由所述处理器执行存储在所述存储器中的指令来执行,包括:An embodiment of the present invention provides a text generating method, where the method is applied to a server, the server includes a processor and a memory, and the method is executed by the processor executing an instruction stored in the memory, including:
获取针对一主题的代码数据,所述代码数据按照机器语言编写、携 带有该主题下的内容数据;Obtaining code data for a topic, the code data is written in a machine language, With content data under this topic;
从所述内容数据中识别出至少一个对象的行为表现数据;及,Identifying behavioral performance data of at least one object from the content data; and,
根据所述至少一个对象的行为表现数据,生成该主题的文本。Generating the text of the subject based on the behavioral performance data of the at least one object.
本发明实施例还提供了一种服务器,包括处理器和存储器,所述存储器中存储可被所述处理器执行的指令,当执行所述指令时,所述处理器用于:The embodiment of the present invention further provides a server, including a processor and a memory, where the memory stores instructions executable by the processor, and when executing the instruction, the processor is configured to:
获取针对一主题的代码数据,所述代码数据按照机器语言编写、携带有该主题下的内容数据;Obtaining code data for a subject, the code data being written in a machine language and carrying content data under the subject;
从所述内容数据中识别出至少一个对象的行为表现数据;及,Identifying behavioral performance data of at least one object from the content data; and,
根据所述至少一个对象的行为表现数据,生成该主题的文本。Generating the text of the subject based on the behavioral performance data of the at least one object.
本发明实施例还提供了一种计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行上述的方法。The embodiment of the invention further provides a computer readable storage medium storing computer readable instructions, which can cause at least one processor to execute the above method.
附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚的说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来说,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。其中,In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following drawings will be briefly described, and the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work. among them,
图1为本发明一实施例所涉及的实施环境的结构示意图;1 is a schematic structural view of an implementation environment according to an embodiment of the present invention;
图2a为依据本发明一实施例的文本生成方法的示例性流程图;2a is an exemplary flowchart of a text generating method according to an embodiment of the present invention;
图2b为依据本发明另一实施例的文本生成方法的示例性流程图;2b is an exemplary flowchart of a text generating method according to another embodiment of the present invention;
图3为依据本发明一实施例的确定描述词组的示例性流程图;3 is an exemplary flow chart for determining a description phrase in accordance with an embodiment of the present invention;
图4为依据本发明一实施例的所生成文本的示意图;4 is a schematic diagram of generated text according to an embodiment of the invention;
图5为依据本发明又一实施例的文本生成方法的示例性流程图;FIG. 5 is an exemplary flowchart of a text generating method according to still another embodiment of the present invention; FIG.
图6为依据本发明另一实施例的所生成文本的示意图; 6 is a schematic diagram of generated text according to another embodiment of the present invention;
图7为依据本发明一实施例的所展示文本的示意图;Figure 7 is a schematic illustration of the displayed text in accordance with an embodiment of the present invention;
图8a为依据本发明一实施例的服务器的结构示意图;FIG. 8a is a schematic structural diagram of a server according to an embodiment of the present invention; FIG.
图8b为依据本发明另一实施例的服务器的结构示意图;FIG. 8b is a schematic structural diagram of a server according to another embodiment of the present invention; FIG.
图9a为依据本发明又一实施例的服务器的结构示意图;FIG. 9a is a schematic structural diagram of a server according to still another embodiment of the present invention; FIG.
图9b为依据本发明一实施例的服务器的结构示意图;FIG. 9b is a schematic structural diagram of a server according to an embodiment of the present invention; FIG.
图10为依据本发明一实施例的文本展示方法的示例性流程图;FIG. 10 is an exemplary flowchart of a text display method according to an embodiment of the present invention; FIG.
图11为依据本发明一实施例的计算热度因子的示例性流程图;11 is an exemplary flow chart for calculating a heat factor according to an embodiment of the invention;
图12为依据本发明一实施例的所生成文本的示意图;FIG. 12 is a schematic diagram of generated text according to an embodiment of the invention; FIG.
图13为依据本发明另一实施例的文本展示方法的示例性流程图;FIG. 13 is an exemplary flowchart of a text display method according to another embodiment of the present invention; FIG.
图14为依据本发明一实施例的展示文本的示意图;FIG. 14 is a schematic diagram showing text displayed according to an embodiment of the invention; FIG.
图15为依据本发明另一实施例的展示文本的示意图;FIG. 15 is a schematic diagram showing text displayed according to another embodiment of the present invention; FIG.
图16依据本发明另一实施例的服务器的结构示意图;16 is a schematic structural diagram of a server according to another embodiment of the present invention;
图17为依据本发明又一实施例的服务器的结构示意图。FIG. 17 is a schematic structural diagram of a server according to still another embodiment of the present invention.
实施方式Implementation
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
现有的文本生成方式,建立在人工文字直播报道的基础上,并非软件自动撰写,必须依托第三方的图文直播***来辅助完成;另外,抓取的只能是段落,用来组合的模板也是固定的,这种段落抓取和组合的方法导致所生成的报道拼凑感强,也比较机械。因此,所生成的报道可读性差,传递的信息量有限,满足不了用户对详细信息的了解需求,也降 低了文本生成设备的资源利用率。The existing text generation method is based on the live text report of the artificial text. It is not automatically written by the software. It must be assisted by the third-party graphic live broadcast system. In addition, the only template that can be captured is the combination template. It is also fixed. This method of grabbing and combining the paragraphs leads to a strong sense of patchwork and a mechanical comparison. Therefore, the generated report is poorly readable, and the amount of information transmitted is limited, which does not satisfy the user's need to understand the detailed information. The resource utilization of the text generation device is low.
图1为本发明一实施例所涉及的实施环境的结构示意图。如图1所示,文本展示***100包括服务器110、网络120、终端设备130以及用户140。其中,服务器110包括处理器和存储器,本发明中的方法实施例由处理器执行存储在存储器中的指令来执行。具体地,服务器110包括代码数据库111、文本素材数据库112和文本生成处理单元113。终端设备130上安装有客户端130-1。FIG. 1 is a schematic structural view of an implementation environment according to an embodiment of the present invention. As shown in FIG. 1, the text presentation system 100 includes a server 110, a network 120, a terminal device 130, and a user 140. Wherein the server 110 includes a processor and a memory, and the method embodiments of the present invention are executed by the processor executing instructions stored in the memory. Specifically, the server 110 includes a code database 111, a text material database 112, and a text generation processing unit 113. A client 130-1 is installed on the terminal device 130.
客户端130-1作为媒体推广方的应用程序,可以向用户推荐展示相关的文本,并提供社交平台供用户进行交互。用户140登录客户端130-1后,可以浏览感兴趣的文本。例如,客户端130-1为腾讯新闻客户端,用户登录后可以浏览感兴趣的体育新闻等。The client 130-1, as an application of the media promoter, can recommend the relevant text to the user and provide a social platform for the user to interact with. After the user 140 logs in to the client 130-1, the user can browse the text of interest. For example, the client 130-1 is a Tencent news client, and the user can browse for sports news and the like after logging in.
在本发明的实施例中,代码数据库111中存储有针对各个主题的代码数据,并实时更新;文本素材数据库112中存储有生成文本时所使用到的词组、短语等语料或者预设的文本模板等。文本生成处理单元113用于读取代码数据库111中存储的代码数据,识别出行为表现数据,确定出描述词组,结合文本素材数据库112中存储的语料生成文本,并且还可以确定出文本的优先级。In the embodiment of the present invention, the code database 111 stores code data for each topic and updates in real time; the text material database 112 stores a phrase or phrase used in generating the text or a preset text template. Wait. The text generation processing unit 113 is configured to read the code data stored in the code database 111, identify the behavior presentation data, determine the description phrase, generate the text in conjunction with the corpus stored in the text material database 112, and also determine the priority of the text. .
然后,服务器110将生成的文本以及所确定的优先级发送给终端设备130中的客户端130-1,客户端130-1可以根据优先级确定的渠道和内容等级向用户推荐展示该文本,并提供社交平台供用户进行交互。Then, the server 110 sends the generated text and the determined priority to the client 130-1 in the terminal device 130, and the client 130-1 may recommend the text to the user according to the channel and content level determined by the priority, and Provide a social platform for users to interact.
其中,服务器110可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。网络120可以为无线或有线的形式将服务器110和终端设备130进行相连。终端设备130可以为智能终端,包括智能手机、平板电脑、膝上型便携计算机等。The server 110 can be a server, or a server cluster composed of several servers, or a cloud computing service center. The network 120 can connect the server 110 and the terminal device 130 in a wireless or wired form. The terminal device 130 can be a smart terminal, including a smart phone, a tablet computer, a laptop portable computer, and the like.
图2a为依据本发明一实施例的文本生成方法的示例性流程图。该 方法应用于服务器,服务器包括处理器和存储器,该方法由处理器执行存储在存储器中的指令来执行。如图2a所示,该方法可包括如下步骤:2a is an exemplary flow chart of a text generating method in accordance with an embodiment of the present invention. The The method is applied to a server comprising a processor and a memory, the method being executed by the processor executing instructions stored in the memory. As shown in Figure 2a, the method can include the following steps:
步骤201,获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据。Step 201: Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
本步骤中,服务器中的代码数据库获取并存储针对一主题的代码数据,这些代码数据按照机器语言编写,并携带有该主题下的内容数据。其中,代码开发人员可以事先将代码数据写入代码数据库。机器语言例如为java、PHP(超文本预处理器)、asp.net、ruby等,每种机器语言都具有自己的一套程序编写规则。In this step, the code database in the server acquires and stores code data for a topic, which is written in machine language and carries content data under the topic. Among them, the code developer can write the code data to the code database in advance. Machine languages such as java, PHP (hypertext preprocessor), asp.net, ruby, etc., each machine language has its own set of programming rules.
主题可以为体育赛事,所编写的代码数据中包括体育赛事中各个项目下各个运动员的比赛细节、成绩等内容数据;又如,主题可以为歌唱比赛,所编写的代码数据包括各个环节中各个歌手的比赛细节、成绩等内容数据。又如,主题可以为遥控玩具车大赛,所编写的代码数据包括在各个赛道中各个玩具车的运行速度、运行时间、排名等内容数据。The theme can be a sports event. The code data written includes the game details and grades of each athlete under each item in the sports event. For example, the theme can be a singing competition, and the code data written includes each singer in each link. Content details of the game, results and other content data. For another example, the theme may be a remote control toy car competition, and the code data written includes content data such as running speed, running time, ranking, and the like of each toy car in each track.
在本发明实施例中,考虑到主题报道的时效性,如新闻报道,上述代码数据为实时代码数据,这样根据实时代码数据生成的文本能够及时展示给用户,具备实时性。In the embodiment of the present invention, considering the timeliness of the topic report, such as news reports, the above code data is real-time code data, so that the text generated according to the real-time code data can be presented to the user in time, and has real-time performance.
步骤202,从内容数据中识别出至少一个对象的行为表现数据。Step 202: Identify behavioral performance data of at least one object from the content data.
本步骤实现了从机器可读的“代码数据”到客户端用户可读的“行为表现数据”之间的转换。This step implements a transition from machine-readable "code data" to "user behavior-readable data" readable by the client.
本申请中,主题可以为具备比赛性质的各类事件,例如,体育比赛、歌唱比赛、遥控机器人大赛等。对象为某一主题的参与主体,包括人物、动物或者物体等。对于每个对象,其行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据。所谓当前表现评判数据为针对对象在其一个行为上的表现进行评判的结果数据。 In the present application, the subject matter may be various events having a competitive nature, such as a sports competition, a singing competition, a remote control robot competition, and the like. The subject is the subject of a subject, including characters, animals or objects. For each object, its behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior. The current performance evaluation data is result data for judging the performance of an object in one of its behaviors.
例如,主题为体育赛事时,对象为运动员,每个运动员的行为表现数据对应于运动员在比赛过程中的每一个比赛细节,行为包括运动员的每个动作,当前表现评判数据包括每个动作的得分、裁判的评判结果、总成绩及奖项结果等。以跳水报道为例,跳水比赛是评分类比赛,每个运动员的行为表现数据包括由多个运动技术细节限定的行为:走板、跑台、起跳、高度、具备一定难度系数的整套动作、空中姿态、协调配合、入水动作等,及每个行为对应的当前表现评判数据:相应的得分或排名。For example, when the theme is a sports event, the object is an athlete, and the performance data of each athlete corresponds to each game details of the athlete during the game, the behavior includes each action of the player, and the current performance evaluation data includes the score of each action. , judges' judgment results, total scores and award results. Taking the diving report as an example, the diving competition is a scoring competition. Each athlete's behavioral performance data includes behaviors defined by multiple sports technical details: walking, running, taking off, altitude, a set of actions with a certain degree of difficulty, air Attitude, coordination, water entry, etc., and current performance evaluation data corresponding to each behavior: corresponding score or ranking.
服务器在进行识别时,首先根据代码的编写规则设置代码数据与对象、行为、当前表现评判数据之间的映射关系,然后根据该映射关系从代码数据中识别出每个对象的行为以及当前表现评判数据。When the server recognizes, firstly, according to the code writing rules, the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data is set, and then the behavior of each object and the current performance evaluation are identified from the code data according to the mapping relationship. data.
例如,以java作为机器语言为例,根据java的语法规则,将代码数据中的多种字段映射为对象、行为、当前表现评判数据。例如,字段“object”映射到“对象”,字段“action”映射到“行为”,字段“score”映射到“当前表现评判数据”。For example, taking java as a machine language as an example, according to the grammar rules of java, various fields in the code data are mapped into object, behavior, and current performance evaluation data. For example, the field "object" maps to "object", the field "action" maps to "behavior", and the field "score" maps to "current performance evaluation data."
表1列出了根据本发明一实施例的行为表现数据结果。主题为里约***女子双人3米跳板决赛,服务器从代码数据中识别出对象有两个运动员,吴敏霞和施廷懋,年龄分别为31和24,行为包括五轮的双人跳水动作,当前表现评判数据包括每轮的得分、排名以及最终的总分和奖牌结果。Table 1 lists the behavioral performance data results in accordance with an embodiment of the present invention. The theme is the women's double 3m springboard final of the Rio Olympic Games. The server identified two athletes from the code data, Wu Minxia and Shi Tingxi, aged 31 and 24 respectively. The behavior includes five rounds of double diving. The current performance evaluation data includes Scores, rankings, and final total scores and medal results for each round.
Figure PCTCN2017101852-appb-000001
Figure PCTCN2017101852-appb-000001
Figure PCTCN2017101852-appb-000002
Figure PCTCN2017101852-appb-000002
表1 识别出的行为表现数据结果Table 1 Identification of behavioral performance data results
步骤203,根据至少一个对象的行为表现数据,生成该主题的文本。Step 203: Generate text of the theme according to behavior performance data of the at least one object.
本步骤实现了从“行为表现数据”到“文本”的转换,得到了该主题下的文本。This step implements the conversion from "behavioral performance data" to "text" and gets the text under the theme.
在上述实施例中,通过获取针对一主题的代码数据,从内容数据中识别出至少一个对象的行为表现数据,以及根据至少一个对象的行为表现数据,生成该主题的文本,直接对接代码数据库,不需要依托直播***,摆脱了现有技术中对人工文字直播报道的依赖,能够将比赛的所有技术细节与表现判定的相关数据还原成为生动的、人性化的比赛文本表述,不仅速度快,信息量大,还兼具可读性,真正实现了机器报道的类人化。In the above embodiment, by acquiring code data for a topic, identifying behavioral performance data of at least one object from the content data, and generating text of the topic according to behavioral performance data of the at least one object, directly docking the code database, It does not need to rely on the live broadcast system, and it can get rid of the dependence on the artificial text live report in the prior art, and can restore all the technical details and performance judgment data of the game to a vivid and humanized game text expression, not only fast, information The volume is large, and it is also readable, which truly realizes the humanization of machine reports.
针对上述步骤203,图2b为依据本发明另一实施例的文本生成方法的示例性流程图。如图2b所示,在图2a给出的步骤201和步骤202基础之上,步骤203又进一步分为步骤203a和步骤203b。具体地,For the above step 203, FIG. 2b is an exemplary flowchart of a text generating method according to another embodiment of the present invention. As shown in FIG. 2b, based on step 201 and step 202 given in FIG. 2a, step 203 is further divided into step 203a and step 203b. specifically,
步骤203a,根据至少一个对象的行为表现数据确定出至少一个描述词组。 Step 203a: Determine at least one description phrase according to behavior performance data of the at least one object.
本步骤实现了从“行为表现数据”到“描述词组”之间的关联。具 体地,确定描述词组的方法有以下三种:This step implements the association between "behavior performance data" and "description phrases." With Physically, there are three ways to determine a phrase:
方法一:将该主题下多个对象的行为表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的至少一个描述词组。Method 1: comparing behavioral performance data of the plurality of objects under the theme, and selecting at least one description phrase that matches the comparison result from the preset description phrase.
该方法是在同一主题下针对同一项目对多个对象进行横向比较。例如,***的跳水比赛这一主题,一共包括8个比赛项目,分别为:女子双人3米跳板跳水、男子双人3米跳板跳水、女子3米跳板跳水、男子3米跳板跳水、女子双人10米跳板跳水、男子双人10米跳板跳水、女子10米跳板跳水、男子10米跳板跳水。针对同一比赛项目,将多个运动员的数据进行同步比较,根据比较的结果再还原出可描述的词组。This method is a horizontal comparison of multiple objects for the same project under the same topic. For example, the theme of the diving competition of the Olympic Games includes a total of 8 competitions, namely: women's double 3m springboard diving, men's double 3m springboard diving, women's 3m springboard diving, men's 3m springboard diving, women's double 10m Springboard diving, men's double 10m springboard diving, women's 10m springboard diving, men's 10m springboard diving. For the same game item, the data of multiple athletes are compared synchronously, and the descriptive phrases are restored according to the comparison result.
在一实施例中,将多个对象的行为表现数据进行两两比较的结果包括大于、等于或者小于,服务器预先设置出相应的多个描述词组。表2为依据本发明一实施例的对多个对象的行为表现数据进行比较而预设的描述词组。In an embodiment, the result of the pairwise comparison of the behavioral performance data of the plurality of objects includes greater than, equal to, or less than, and the server presets a corresponding plurality of description phrases. Table 2 is a description phrase preset by comparing behavioral performance data of a plurality of objects according to an embodiment of the present invention.
Figure PCTCN2017101852-appb-000003
Figure PCTCN2017101852-appb-000003
表2 对多个对象的行为表现数据进行比较而预设的描述词组Table 2 compares the performance data of multiple objects and presets the description phrase
例如,在里约***男子400米自由泳决赛中,运动员孙杨的最终成绩为3分41秒68,另一个运动员霍尔顿的最终成绩为3分41秒55, 将二者的成绩比较,孙杨的成绩略低于霍尔顿的成绩,那么服务器可以确定出相应的描述词组为“略微落后”、“不敌对手”和“非常遗憾”。For example, in the men's 400m freestyle final of the Rio Olympic Games, the athlete Sun Yang's final score was 3 minutes and 41 seconds 68, and the other athlete Holden's final score was 3 minutes and 41 seconds 55. Comparing the results of the two, Sun Yang's score is slightly lower than Holden's score, then the server can determine the corresponding description phrase as "slightly backward", "not rival" and "very regrettable".
方法二:针对每个对象,获取该对象的历史表现数据,将该对象的行为表现数据和历史表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的至少一个描述词组。Method 2: For each object, obtain historical performance data of the object, compare the behavioral performance data of the object with the historical performance data, and select at least one description phrase that matches the comparison result from the preset description phrase.
此方法是将当前的表现数据和历史表现数据进行纵向比较。具体地,图3为依据本发明一实施例的确定描述词组的示例性流程图。如图3所示,包括如下步骤:This method is a vertical comparison of current performance data and historical performance data. Specifically, FIG. 3 is an exemplary flow chart for determining a description phrase in accordance with an embodiment of the present invention. As shown in Figure 3, the following steps are included:
步骤2031,针对每个对象,获取该对象的历史表现数据。Step 2031: Obtain historical performance data of the object for each object.
以体育赛事为例,历史表现数据包括运动员的过往表现、世界排名、项目调整等等。服务器可以从历史代码数据中识别出该对象的历史表现数据;或者,服务器事先写入并保存相应主题下每个对象的历史表现数据。Take sports events as an example. Historical performance data includes athletes' past performance, world rankings, project adjustments, and so on. The server may identify historical performance data of the object from the history code data; or, the server writes and saves historical performance data of each object under the corresponding topic in advance.
步骤2032,将该对象的行为表现数据和历史表现数据按多个数据类型分别进行比较。In step 2032, the behavioral performance data and the historical performance data of the object are respectively compared according to a plurality of data types.
此步骤采用分类比较的方法,按照数据的属性分为多个数据类型,例如,比赛地点、对象年龄、每一环节的动作、得分、排名等。This step uses a method of classification comparison, which is divided into multiple data types according to the attributes of the data, for example, the location of the game, the age of the object, the action of each link, the score, the ranking, and the like.
步骤2033,从多个数据类型的多个比较结果中筛选出具备展示价值的比较结果。Step 2033: Filter the comparison result with the display value from the plurality of comparison results of the plurality of data types.
考虑到最终生成的文本内容对用户而言是否具备展示价值,从多个数据类型的多个比较结果中筛选出具备展示价值的比较结果。筛选的方法可以是对每个比较结果根据展示价值进行评分,然后将评分进行排序,从中选择出多个具备展示价值的比较结果Considering whether the final generated text content has display value for the user, the comparison result with the display value is selected from the plurality of comparison results of the plurality of data types. The screening method may be to score each comparison result according to the display value, and then sort the scores, and select a plurality of comparison results with display value.
例如,对于运动员孙杨来说,对于数据类型:其参加自由泳比赛的地点进行比较,如“北京***”和“里约***”相比,没有报道的 价值,那么认为此类型的比较结果不具备展示价值,其评分为0。又如,针对数据类型:孙杨自由泳中每50m阶段的速度进行比较,比较结果能够指示出不同的成绩差距,差距越大,评分越高,越有报道价值,这一比较结果将被筛选出认为具备展示价值。For example, for athlete Sun Yang, there is no report on the type of data: the location of the freestyle competition, such as the "Beijing Olympics" and the "Rio Olympics". Value, then think that this type of comparison result does not have display value, its score is 0. For another example, for the data type: the speed of each 50m stage in Sun Yang's freestyle, the comparison results can indicate different scores. The larger the gap, the higher the score, the more reported value, the result will be selected. It is considered to have display value.
步骤2034,从预设的描述词组中选择出与具备展示价值的比较结果相匹配的至少一个描述词组。Step 2034: Select at least one description phrase that matches the comparison result with the display value from the preset description phrase.
类似于方法一的描述,行为表现数据和历史表现数据之间的比较结果也可以分为大于、等于和小于,参见上述表2,因此,从预设的描述词组中也可以选择出相匹配的至少一个描述词组。Similar to the description of Method 1, the comparison between behavioral performance data and historical performance data can also be divided into greater than, equal to and less than, see Table 2 above, therefore, the matching can also be selected from the preset description phrases. At least one description phrase.
方法三:针对每个对象,将当前的表现数据和表现期望数据进行比较。Method 3: Compare the current performance data with the performance expectation data for each object.
具体为,针对每个对象,获取该对象的表现期望数据;将该对象的行为表现数据和表现期望数据进行比较,从预设的描述词组中选择出与比较结果相匹配的至少一个描述词组。服务器可以从历史代码数据中识别出每个对象的表现期望数据;或者,服务器事先写入并保存相应主题下每个对象的表现期望数据。Specifically, for each object, the performance expectation data of the object is obtained; the behavioral performance data of the object is compared with the performance expectation data, and at least one description phrase matching the comparison result is selected from the preset description phrases. The server may identify performance expectation data of each object from the history code data; or, the server writes and saves performance expectation data of each object under the corresponding topic in advance.
表3为依据本发明一实施例的根据与期望表现数据相比较预设的描述词组。例如,以孙杨参加男子400米自由泳决赛的成绩为例,由于落后于霍尔顿,根据赛前的期望值判断出,孙杨获得银牌的描述词组为“遗憾”,而不是超出预期。Table 3 is a description phrase that is preset based on comparison with expected performance data in accordance with an embodiment of the present invention. For example, taking Sun Yang’s participation in the men’s 400m freestyle final as an example, due to the fall behind Holden, according to the expectations before the game, Sun Yang’s description of the silver medal was “sorry” rather than exceeded expectations.
Figure PCTCN2017101852-appb-000004
Figure PCTCN2017101852-appb-000004
Figure PCTCN2017101852-appb-000005
Figure PCTCN2017101852-appb-000005
表3 根据与期望表现数据相比较预设的描述词组Table 3 Preset description phrases based on comparison with expected performance data
步骤203b,根据至少一个对象的行为表现数据和至少一个描述词组生成该主题的文本。 Step 203b: Generate text of the topic according to the behavior data of the at least one object and the at least one description phrase.
本步骤实现了从分散的“行为表现数据”、“描述词组”到完整的“文本”的扩展。生成的具体方法包括,在预设的语料数据库中为每个描述词组选择衔接词;将至少一个对象的行为表现数据、衔接词和至少一个描述词组连接成至少一个短句;将至少一个短句组合成至少一个段落,连接至少一个段落得到文本。This step implements an extension from decentralized "behavioral performance data", "descriptive phrase" to complete "text". The specific method for generating includes: selecting a conjunction word for each description phrase in a preset corpus database; linking at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence; at least one short sentence Combine at least one paragraph and connect at least one paragraph to get the text.
其中,衔接词起到了“起”、“承”、“转”、“合”的功能,具体包括上下文的过渡词、语气上的连接词、逻辑上的连接词、历史表现数据中的背景介绍等。例如,以男子400米自由泳决赛的成绩为例,对于对象“孙杨”确定出多个描述词组有:“遗憾未能卫冕”、“一直被国人寄予夺冠厚望”和“不敌对手霍尔顿”。对于“遗憾未能卫冕”,确定衔接词为同位语“获得亚军”;对于“一直被国人寄予夺冠厚望”,确定衔接词为缘由“作为首位获得***金牌的中国游泳男子选手,同时也是该项目的上届***冠军”;对于“不敌对手霍尔顿”,确定衔接词为转折词“但最终”。Among them, the conjunctions play the functions of “starting”, “contracting”, “transferring” and “combining”, including the contextual transition words, the conjunctions in the tone, the logical conjunctions, and the background introduction in the historical performance data. Wait. For example, taking the results of the men's 400-meter freestyle final as an example, a number of description phrases were identified for the object "Sun Yang": "Sorry failed to defend", "has always been favored by the Chinese people" and "unfavorable opponent Holden" ". For the "sorry failed to defend", the definition of the conjunction is the apposition "get the runner-up"; for "has always been favored by the Chinese people to win the championship", determine the connection term as the reason for "the first Chinese swimmer to win the Olympic gold medal, but also the project The champion of the previous Olympic Games; for the "invincible opponent Holden", the definition of the conjunction is the turning word "but ultimately."
图4为依据本发明一实施例的所生成文本的示意图。在所生成的文本中包括在男子400米自由泳决赛的主题下对多个对象的描述,有运动员霍尔顿、孙杨、邱子傲、盖、德怀尔、黛提。生成的文本包括三个段落,每个段落中包括运动员的成绩、排名以及由下划线所标识的描述词 组。4 is a schematic diagram of generated text in accordance with an embodiment of the present invention. In the generated text, a description of multiple objects under the theme of the men's 400-meter freestyle final is included, including athletes Holden, Sun Yang, Qiu Ziao, Gai, Dwyer, and Qiti. The generated text consists of three paragraphs, each of which includes the athlete's grades, rankings, and descriptors identified by underscores. group.
基于上述实施例中给出的文本生成方法,机器人能够独立通过机器自身的学习和算法实现机器人报道的人性化表述,并且这种人性化表述技术带来的报道文本通过了图灵测试(即如果电脑能在5分钟内回答由人类测试者提出的一系列问题,且其超过30%的回答让测试者误认为是人类所答,则电脑通过测试),稿件质量已与人工报道无差异。Based on the text generation method given in the above embodiment, the robot can independently realize the humanized expression of the robot report through the machine's own learning and algorithm, and the report text brought by the humanized expression technology passes the Turing test (ie if The computer can answer a series of questions raised by human testers within 5 minutes, and more than 30% of the answers make the tester mistakenly think that it is answered by humans, then the computer passes the test), and the quality of the manuscript has no difference from the manual report.
此外,相比于通过语音识别将电视直播语音解说转化为文字描述来拼凑比赛文字报道的方法,由于语音转文字的技术限制,其转化错误率相当高,无法批量应用,而上述实施例中给出的文本生成方法,通过代码数据->行为表现数据->描述词组->段落的扩充->文本的形成,相比语音处理,算法的复杂度有所降低,减少了服务器侧处理器的处理负担和响应时间,同时又保证了高质量的文本,因此可以大批量应用,从而能够大大提高服务器的资源利用率。In addition, compared with the method of converting the live broadcast voice commentary into a text description by voice recognition to piece together the game text report, due to the technical limitation of voice-to-text, the conversion error rate is quite high and cannot be applied in batches, but in the above embodiment, The text generation method, through the code data -> behavioral performance data -> description phrase -> paragraph expansion -> text formation, compared to speech processing, the complexity of the algorithm is reduced, reducing the processing of the server side processor The burden and response time, while ensuring high-quality text, can be applied in large quantities, which can greatly improve the resource utilization of the server.
图5为依据本发明另一实施例的文本生成方法的示例性流程图。该方法应用于服务器。如图5所示,该方法可包括如下步骤:FIG. 5 is an exemplary flowchart of a text generating method according to another embodiment of the present invention. This method is applied to the server. As shown in FIG. 5, the method may include the following steps:
步骤501,获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据。Step 501: Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
步骤502,根据映射关系从内容数据中识别出至少一个对象的行为表现数据,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据。Step 502: Identify behavioral performance data of at least one object from the content data according to the mapping relationship, the behavioral performance data including one or more behaviors and current performance evaluation data corresponding to each behavior.
参照上述步骤202的描述,服务器首先根据代码的编写规则设置代码数据与对象、行为、当前表现评判数据之间的映射关系,然后根据该映射关系进行识别。Referring to the description of step 202 above, the server first sets the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data according to the writing rules of the code, and then identifies according to the mapping relationship.
图6为依据本发明另一实施例的所生成文本的示意图。如图6所示的界面600示出了一生成的文本,在方框610给出了文本的题目为“奥 运会跳水第1金!吴敏霞/施廷懋不负众望夺冠”,在方框620中给出文本的推广方为“腾讯体育”,方框630中给出了文本的正文,其中包括了上述表1中给出的所有行为表现数据,用下划线来标识。6 is a schematic diagram of generated text in accordance with another embodiment of the present invention. Interface 600, shown in Figure 6, shows a generated text, and at block 610, the title of the text is "Olympic" The first gold medal in the diving! Wu Minxia/Shi Tingxi live up to expectations and win the championship. The promotion of the text in Box 620 is “Tencent Sports”, and the text of the text is given in Box 630, which includes all the performance data given in Table 1 above. Underlined to identify.
步骤503,根据至少一个对象的行为表现数据、每个对象的历史表现数据和表现期望数据确定出至少一个描述词组。Step 503: Determine at least one description phrase according to behavior performance data of at least one object, historical performance data of each object, and performance expectation data.
此步骤综合了步骤203a中给出的三种确定描述词组的方法,在此不再赘述。This step combines the three methods for determining the description phrase given in step 203a, and details are not described herein again.
步骤504,基于语料数据库,根据至少一个对象的行为表现数据和至少一个描述词组,生成该主题的文本。Step 504: Generate a text of the topic based on the behavior data of the at least one object and the at least one description phrase based on the corpus database.
基于步骤203b的描述,在生成文本时,预设的语料数据库中为每个描述词组选择衔接词;将至少一个对象的行为表现数据、衔接词和至少一个描述词组连接成至少一个短句;将至少一个短句组合成至少一个段落,连接至少一个段落得到文本。Based on the description of step 203b, when generating the text, the preset corpus database selects a conjunction word for each description phrase; and connects at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence; At least one short sentence is combined into at least one paragraph, and at least one paragraph is connected to obtain the text.
考虑到文本可以有各种不同的风格,预先设置多种类型的段落模板,这些不同类型的段落模板构成了不同风格的文本。例如,段落模板的类型包括摘要、背景介绍、热点、详细正文、综述、附录等。如图7所示,该文本包括的段落包括:摘要(见方框730)、热点(见方框740)、详细正文(见方框750)、附录(见方框760)。Considering that text can have a variety of styles, multiple types of paragraph templates are pre-set, and these different types of paragraph templates constitute different styles of text. For example, the types of paragraph templates include abstracts, background introductions, hotspots, detailed texts, reviews, appendices, and more. As shown in FIG. 7, the text includes paragraphs including: a summary (see block 730), a hotspot (see block 740), a detailed body (see block 750), and an appendix (see block 760).
在生成文本时,从多种类型的段落模板中选择出文本所包括的至少一个待使用段落模板。然后,在将至少一个短句组合成至少一个段落时,针对每个待使用段落模板,从至少一个短句中确定与该待使用段落模板相匹配的至少一个短句,对所确定的至少一个短句进行组合得到与该待使用段落模板对应的段落。When generating text, at least one to-be-used paragraph template included in the text is selected from a plurality of types of paragraph templates. Then, when the at least one short sentence is combined into at least one paragraph, at least one short sentence matching the to-be-used paragraph template is determined from the at least one short sentence for each of the to-be-used paragraph templates, for the determined at least one The short sentences are combined to obtain a paragraph corresponding to the paragraph template to be used.
此外,考虑到每个段落模板具有一定的字数限制,在组合成段落时,控制该段落的字数不超过段落模板的字数限制。 In addition, considering that each paragraph template has a certain word limit, when combined into a paragraph, the number of words controlling the paragraph does not exceed the word limit of the paragraph template.
步骤505,对生成的文本进行关键词审核。 Step 505, performing keyword review on the generated text.
此处的审核包括排查关键词,对风险加权级别较高的稿件还可以进行人工审核,例如,将关键词提交到人工审核窗口进行审核。The review here includes troubleshooting keywords, and manuscripts with higher risk-weighted levels can also be manually reviewed, for example, by submitting keywords to a manual review window for review.
步骤506,将审核后的文本发送给客户端进行展示。In step 506, the audited text is sent to the client for display.
图7为依据本发明一实施例的所展示文本的示意图。在客户端的显示界面700中,向用户推荐展示了一篇体育赛事的报道。题目为“张梦雪为中国军团赢取里约奥运首金”,在方框720中显示推广方“腾讯体育”,日期“2016-08-07”,报道的推出时间为“22:23”,并且提供“评论”选项(见721)和“分享”选项(见722)以供用户在社交平台上进行互动。在方框730中给出该报道的摘要,在方框740中给出了该报道的热点部分“比赛焦点”,在方框750中给出了报道的详细正文“精彩回放”,并且在方框760中给出了报道的附录“选手资料”。Figure 7 is a schematic illustration of the displayed text in accordance with an embodiment of the present invention. In the client's display interface 700, a report showing a sports event is recommended to the user. The title is "Zhang Mengxue wins the first Olympic gold medal for the Chinese Legion". In box 720, the promoter "Tencent Sports" is displayed, the date is "2016-08-07", and the launch time of the report is "22:23". Provide a "Comment" option (see 721) and a "Share" option (see 722) for users to interact on social platforms. A summary of the report is given in block 730, the hotspot portion "game focus" of the story is given in block 740, and the detailed text "Fantastic Playback" of the report is given in block 750, and is in the square The appendix "player profile" of the report is given in block 760.
在本实施例中,通过根据至少一个对象的行为表现数据、每个对象的历史表现数据和表现期望数据确定出至少一个描述词组,能够从“行为表现数据”关联到多个“描述词组”,丰富了文本的内容,进一步提升了文本的信息量和可读性。此外,在组合成段落时,通过设置不同类型的段落模板,能够根据比赛结果智能选择风格不同的表达文本,从而能够输出各种生动、人性化的文本,供用户进行浏览和阅读。In the embodiment, by determining at least one description phrase according to behavior performance data of at least one object, historical performance data of each object, and performance expectation data, it is possible to associate from the "behavior performance data" to a plurality of "description phrases", Enriched the content of the text, further enhancing the amount of information and readability of the text. In addition, when combining paragraphs, by setting different types of paragraph templates, it is possible to intelligently select different styles of expression text according to the result of the game, thereby being able to output various vivid and humanized texts for the user to browse and read.
图8a为依据本发明一实施例的服务器的结构示意图。如图8a所示,服务器804包括:FIG. 8a is a schematic structural diagram of a server according to an embodiment of the invention. As shown in Figure 8a, server 804 includes:
获取模块801,用于获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据;The obtaining module 801 is configured to obtain code data for a topic, where the code data is written in a machine language and carries content data under the topic;
识别模块802,用于从获取模块801得到的内容数据中识别出至少一个对象的行为表现数据;The identification module 802 is configured to identify performance performance data of the at least one object from the content data obtained by the obtaining module 801;
生成模块803,用于根据识别模块802得到的至少一个对象的行为 表现数据,生成该主题的文本。a generating module 803, configured to perform behavior of at least one object according to the identifying module 802 Performance data, generating text for the topic.
图8b为依据本发明另一实施例的服务器的结构示意图。如图8b所示,服务器800包括:FIG. 8b is a schematic structural diagram of a server according to another embodiment of the present invention. As shown in Figure 8b, the server 800 includes:
获取模块810,用于获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据;The obtaining module 810 is configured to obtain code data for a topic, where the code data is written in a machine language and carries content data under the topic;
识别模块820,用于从获取模块810得到的内容数据中识别出至少一个对象的行为表现数据;The identification module 820 is configured to identify performance performance data of the at least one object from the content data obtained by the obtaining module 810;
确定模块830,用于根据识别模块820得到的至少一个对象的行为表现数据确定出至少一个描述词组;及,a determining module 830, configured to determine at least one description phrase according to the behavior performance data of the at least one object obtained by the identification module 820;
生成模块840,用于根据识别模块820得到的至少一个对象的行为表现数据和确定模块830确定出的至少一个描述词组,生成该主题的文本。The generating module 840 is configured to generate text of the topic according to the behavior performance data of the at least one object obtained by the identification module 820 and the at least one description phrase determined by the determining module 830.
在一实施例中,服务器800进一步包括:In an embodiment, the server 800 further includes:
设置模块850,用于根据代码的编写规则设置代码数据与对象、行为、当前表现评判数据之间的映射关系;The setting module 850 is configured to set a mapping relationship between the code data and the object, the behavior, and the current performance evaluation data according to the writing rule of the code;
其中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,识别模块820用于,根据设置模块850设置的映射关系识别出每个对象的行为以及当前表现评判数据。The behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, and the identification module 820 is configured to identify the behavior of each object and the current performance evaluation data according to the mapping relationship set by the setting module 850. .
在一实施例中,确定模块830用于,针对每个对象,获取该对象的历史表现数据;将该对象的行为表现数据和历史表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的至少一个描述词组。In an embodiment, the determining module 830 is configured to: obtain, for each object, historical performance data of the object; compare the behavioral performance data of the object with the historical performance data, and select and compare the preset description phrases. The result matches at least one of the description phrases.
在一实施例中,确定模块830用于,将行为表现数据和历史表现数据按多个数据类型分别进行比较,从多个数据类型的多个比较结果中筛选出具备展示价值的比较结果,从预设的描述词组中选择出与具备展示价值的比较结果相匹配的至少一个描述词组。 In an embodiment, the determining module 830 is configured to compare the behavioral performance data and the historical performance data according to the plurality of data types, and select the comparison result with the display value from the plurality of comparison results of the plurality of data types. At least one description phrase matching the comparison result with the display value is selected in the preset description phrase.
在一实施例中,确定模块830用于,针对每个对象,获取该对象的表现期望数据;将该对象的行为表现数据和表现期望数据进行比较,从预设的描述词组中选择出与比较结果相匹配的至少一个描述词组。In an embodiment, the determining module 830 is configured to: obtain, for each object, performance expectation data of the object; compare the behavior performance data of the object with the performance expectation data, and select and compare the preset description phrases. The result matches at least one of the description phrases.
在一实施例中,生成模块840用于,在预设的语料数据库中为每个描述词组选择衔接词;将至少一个对象的行为表现数据、衔接词和至少一个描述词组连接成至少一个短句;将至少一个短句组合成至少一个段落,连接至少一个段落得到文本。In an embodiment, the generating module 840 is configured to: select a conjunction word for each description phrase in a preset corpus database; and connect at least one object's behavioral performance data, the conjunction word, and the at least one description phrase into at least one short sentence Combine at least one short sentence into at least one paragraph and connect at least one paragraph to get the text.
图9a为依据本发明一实施例的服务器的结构示意图。该服务器900a可包括:处理器910a、存储器920a、端口930a以及总线940a。处理器910a和存储器920a通过总线940a互联。处理器910a可通过端口930a接收和发送数据。其中,FIG. 9a is a schematic structural diagram of a server according to an embodiment of the invention. The server 900a can include a processor 910a, a memory 920a, a port 930a, and a bus 940a. Processor 910a and memory 920a are interconnected by a bus 940a. Processor 910a can receive and transmit data over port 930a. among them,
处理器910a用于执行存储器920a存储的机器可读指令模块。The processor 910a is configured to execute a machine readable instruction module stored by the memory 920a.
存储器920a存储有处理器910a可执行的机器可读指令模块。处理器910a可执行的指令模块包括:获取模块921a、识别模块922a和生成模块923a。其中,The memory 920a stores a machine readable instruction module executable by the processor 910a. The instruction module executable by the processor 910a includes an acquisition module 921a, an identification module 922a, and a generation module 923a. among them,
获取模块921a被处理器910a执行时可以为:获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据。The obtaining module 921a may be executed by the processor 910a to obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
识别模块922a被处理器910a执行时可以为:从获取模块921得到的内容数据中识别出至少一个对象的行为表现数据。When the identification module 922a is executed by the processor 910a, the behavioral performance data of the at least one object may be identified from the content data obtained by the obtaining module 921.
生成模块923a被处理器910a执行时可以为:根据识别模块922得到的至少一个对象的行为表现数据,生成该主题的文本。When the generating module 923a is executed by the processor 910a, the text of the theme may be generated according to the behavior performance data of the at least one object obtained by the identifying module 922.
由此可以看出,当存储在存储器920a中的指令模块被处理器910a执行时,可实现前述各个实施例中获取模块、识别模块和生成模块的各种功能。It can thus be seen that when the instruction modules stored in the memory 920a are executed by the processor 910a, various functions of the acquisition module, the identification module, and the generation module in the various embodiments described above can be implemented.
图9b为依据本发明一实施例的服务器的结构示意图。该服务器900 可包括:处理器910、存储器920、端口930以及总线940。处理器910和存储器920通过总线940互联。处理器910可通过端口930接收和发送数据。其中,FIG. 9b is a schematic structural diagram of a server according to an embodiment of the invention. The server 900 A processor 910, a memory 920, a port 930, and a bus 940 can be included. Processor 910 and memory 920 are interconnected by a bus 940. Processor 910 can receive and transmit data through port 930. among them,
处理器910用于执行存储器920存储的机器可读指令模块。The processor 910 is configured to execute a machine readable instruction module stored by the memory 920.
存储器920存储有处理器910可执行的机器可读指令模块。处理器910可执行的指令模块包括:获取模块921、识别模块922、确定模块923和生成模块924。其中, Memory 920 stores machine readable instruction modules executable by processor 910. The instruction module executable by the processor 910 includes an acquisition module 921, an identification module 922, a determination module 923, and a generation module 924. among them,
获取模块921被处理器910执行时可以为:获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据。The obtaining module 921 is executed by the processor 910 to obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
识别模块922被处理器910执行时可以为:从获取模块921得到的内容数据中识别出至少一个对象的行为表现数据。When the identification module 922 is executed by the processor 910, the behavior performance data of the at least one object may be identified from the content data obtained by the obtaining module 921.
确定模块923被处理器910执行时可以为:根据识别模块922得到的至少一个对象的行为表现数据确定出至少一个描述词组。The determining module 923 may be executed by the processor 910 to: determine at least one description phrase according to the behavior performance data of the at least one object obtained by the identification module 922.
生成模块924被处理器910执行时可以为:根据识别模块922得到的至少一个对象的行为表现数据和确定模块923确定出的至少一个描述词组,生成该主题的文本。When the generating module 924 is executed by the processor 910, the text of the topic may be generated according to the behavior performance data of the at least one object obtained by the identification module 922 and the at least one description phrase determined by the determining module 923.
在一实施例中,处理器910可执行的指令模块进一步包括:设置模块925。其中,In an embodiment, the instruction module executable by the processor 910 further includes: a setting module 925. among them,
设置模块925被处理器910执行时可以为:根据代码的编写规则设置代码数据与对象、行为、当前表现评判数据之间的映射关系;When the setting module 925 is executed by the processor 910, the mapping relationship between the code data and the object, the behavior, and the current performance evaluation data may be set according to a writing rule of the code;
其中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,识别模块922被处理器910执行时可以为:根据设置模块925设置的映射关系识别出每个对象的行为以及当前表现评判数据。The performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior. When the identification module 922 is executed by the processor 910, the identifier 922 may identify each object according to a mapping relationship set by the setting module 925. Behavior and current performance evaluation data.
由此可以看出,当存储在存储器920中的指令模块被处理器910执 行时,可实现前述各个实施例中获取模块、识别模块、确定模块、生成模块和设置模块的各种功能。It can be seen that when the instruction module stored in the memory 920 is executed by the processor 910 In the row, various functions of the acquisition module, the identification module, the determination module, the generation module, and the setup module in the foregoing various embodiments may be implemented.
图10为依据本发明一实施例的文本展示方法的示例性流程图。该方法应用于服务器。如图10所示,该方法可包括如下步骤:FIG. 10 is an exemplary flowchart of a text presentation method in accordance with an embodiment of the present invention. This method is applied to the server. As shown in FIG. 10, the method may include the following steps:
步骤1001,获取针对一主题的代码数据,代码数据按照机器语言编写、携带有该主题下的内容数据。Step 1001: Obtain code data for a topic, and the code data is written in a machine language and carries content data under the topic.
此步骤参照上述步骤201所述的方法,在此不再赘述。This step refers to the method described in the foregoing step 201, and details are not described herein again.
步骤1002,从内容数据中识别出至少一个对象的行为表现数据。Step 1002: Identify behavioral performance data of at least one object from the content data.
在一实施例中,识别出的行为表现数据是多维的数据,其结构可以表示为:{主题、对象、行为、当前表现评判数据}。In an embodiment, the recognized behavioral performance data is multi-dimensional data, and its structure can be expressed as: {topic, object, behavior, current performance evaluation data}.
表4列出了根据本发明一实施例的多维行为数据示例。其中,主题对应为体育赛事,对象对应为运动员,行为对应为具体的动作,当前表现评判数据对应为具体的得分。例如,在里约***男子双杠决赛中,运动员邓书弟选择了难度7.2的动作,E分8.566,总分15.766。Table 4 lists examples of multi-dimensional behavior data in accordance with an embodiment of the present invention. The theme corresponds to a sports event, the object corresponds to an athlete, the behavior corresponds to a specific action, and the current performance evaluation data corresponds to a specific score. For example, in the men's parallel bars final of the Rio Olympic Games, athlete Deng Shudi chose the action of difficulty 7.2, E score 8.566, with a total score of 15.766.
Figure PCTCN2017101852-appb-000006
Figure PCTCN2017101852-appb-000006
表4 多维行为数据示例Table 4 Example of multidimensional behavior data
步骤1003,根据至少一个对象的行为表现数据以及每个对象的历史 表现数据和表现期望数据,确定该主题下的至少一个待推送内容项,并生成每个待推送内容项的文本。Step 1003: Perform performance data according to behavior of at least one object and history of each object Performance data and performance expectation data, determining at least one content item to be pushed under the topic, and generating text for each content item to be pushed.
本步骤中,服务器获取每个对象的历史表现数据,并设置每个对象的表现期望数据。待推送内容项是指可能被推送的一项内容,例如,具有新闻价值的新闻点或者事件。In this step, the server acquires historical performance data of each object and sets performance expectation data of each object. A content item to be pushed refers to a content that may be pushed, for example, a news point or event with newsworthiness.
在确定至少一个待推送内容项时,具体包括:针对每个对象,根据该对象的行为表现数据以及历史表现数据和/或表现期望数据,进行多项数据的交叉对比,确定该对象对应的内容项是否满足预设的推送条件;若满足该推送条件,则确定该对象对应的内容项为待推送的,并计算得到该对象对应的待推送内容项的热度因子。When determining at least one content item to be pushed, the method specifically includes: for each object, performing cross-comparison of the plurality of data according to the behavior performance data of the object and the historical performance data and/or the performance expected data, and determining the content corresponding to the object. Whether the item satisfies the preset pushing condition; if the pushing condition is met, determining that the content item corresponding to the object is to be pushed, and calculating a heat factor of the item to be pushed corresponding to the object.
以***跆拳道比赛为例,男子58公斤级跆拳道比赛中,中国选手赵帅在并不被看好的情况下获得金牌,并且实现了中国男子选手在***跆拳道金牌零的突破。可见,根据行为表现数据和历史表现数据之间的对比,确定赵帅在此次比赛中的当前成绩有了明显的提升;和表现期望数据相比,获得的金牌成绩显然远远超出媒体、公众或专业人士的预期。因此,服务器确定该内容项满足推送条件,被判定为冷门爆热的新闻点,并量化得到推送该内容项时的热度因子。Take the Olympic Taekwondo competition as an example. In the men's 58kg Taekwondo competition, Chinese athlete Zhao Shuai won the gold medal without being optimistic, and achieved the breakthrough of the Chinese men's Olympic gold medal in the Olympic Taekwondo. It can be seen that according to the comparison between behavioral performance data and historical performance data, it is determined that Zhao Shuai’s current performance in this competition has been significantly improved; compared with the performance expectations data, the gold medal scores obtained clearly far exceed the media and the public. Or the expectations of professionals. Therefore, the server determines that the content item satisfies the push condition, is determined to be a hot news point, and quantifies the heat factor when the content item is pushed.
在生成每个待推送内容项的文本时,根据本发明的一个实施例,首先,在预设的文本素材数据库中(见图1)选择出与该待推送内容项相匹配的至少一个描述词组,然后,将该待推送内容项所涉及至少一个对象的行为表现数据、历史表现数据和/或表现期望数据以及至少一个描述词组,组合成至少一个段落,连接至少一个段落得到文本。In generating each text of the content item to be pushed, according to an embodiment of the present invention, first, at least one description phrase matching the content item to be pushed is selected in a preset text material database (see FIG. 1). And then, the behavioral performance data, the historical performance data, and/or the performance expectation data and the at least one description phrase of the at least one object related to the content item to be pushed are combined into at least one paragraph, and at least one paragraph is connected to obtain the text.
或者,根据本发明的另一个实施例,预先设置每个热度因子所对应的文本模板。在确定了待推送内容项的热度因子后,将每个待推送内容项所涉及至少一个对象的行为表现数据、历史表现数据和/或表现期望 数据以及其他对象的行为表现数据嵌入到该待推送内容项所对应的文本模板中,从而生成文本。Alternatively, according to another embodiment of the present invention, a text template corresponding to each heat factor is set in advance. After determining the heat factor of the content item to be pushed, behavior performance data, historical performance data, and/or performance expectations of at least one object involved in each content item to be pushed Data and other object behavior data are embedded in the text template corresponding to the item to be pushed, thereby generating text.
在一实施例中,图10所示的方法进一步包括步骤1004。具体为,In an embodiment, the method illustrated in FIG. 10 further includes a step 1004. Specifically,
步骤1004,对于每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级,并将所确定的优先级和该待推送内容项所对应的文本发送给客户端,以使客户端根据优先级展示该文本。Step 1004: Determine, for each content item to be pushed, a priority for displaying the text corresponding to the content item to be pushed according to at least one preset hotness of the topic, and set the determined priority and the content item to be pushed. The corresponding text is sent to the client so that the client displays the text according to priority.
由步骤1003可知,确定待推送内容项是根据每个对象的行为表现数据、历史表现数据和表现期望数据进行交叉对比得到,即针对的是每个对象。而在展示文本时,还需进一步考虑推送内容项时,该主题是否具备足够的热度。具体地,推送热度可以包括该主题的热门度、该主题下至少一个对象的热门度、该主题所涉及设备/技术的成熟度、该主题所涉及环境/场地的变化程度中的任一项或任意几项。It can be seen from step 1003 that it is determined that the content item to be pushed is cross-compared according to the behavior performance data, the historical performance data and the performance expectation data of each object, that is, for each object. When displaying text, you need to further consider whether the topic is sufficiently hot when you push the content item. Specifically, the push heat may include any of the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/technology involved in the topic, the degree of change of the environment/site involved in the topic, or Any number of items.
例如,该主题为***众多赛事中的一项赛事,该主题的热门度是指该赛事是否具备足够的关注度,如男子400米自由泳决赛是关注的热门比赛;在该赛事中对象有运动员孙杨,那么对象的热门度是指孙杨为热门夺冠选手;该主题所涉及设备/物体的成熟度是指赛事所使用设备的技术水平或者物体的水准,例如,游泳比赛中使用的泳衣的功能性是否增强,马术比赛中所使用的赛马的质量如何;该主题所涉及环境/场地的变化程度是指赛事进行时所处的环境或者场地可能的突发情况,例如,马术比赛中突然下大雨,场地变得泥泞,影响到赛马的动作和运动员的成绩;或者,马拉松比赛时突然下大雨,路面湿滑,影响到了运动员的动作和成绩。For example, the theme is one of the many events in the Olympic Games. The popularity of the theme refers to whether the event has sufficient attention. For example, the men’s 400m freestyle final is a popular game of interest; in this event, there are athletes and grandchildren. Yang, then the popularity of the object refers to Sun Yang as a popular champion; the maturity of the device/object involved in the theme refers to the technical level of the equipment used in the event or the level of the object, for example, the function of the swimsuit used in the swimming competition. Whether the sex is enhanced, the quality of the horses used in the equestrian competition; the degree of environmental/site change involved in the theme refers to the environment in which the event takes place or the possible sudden situation of the venue, for example, sudden heavy rain in the equestrian competition The venue became muddy, affecting the movements of the horses and the performance of the athletes; or, during the marathon, it suddenly rained heavily and the road was slippery, affecting the athletes' movements and achievements.
本步骤中,所确定的优先级对应了不同的文本展示风格,不同的优先级将对应不同的推送渠道和内容等级。在确定优先级时,服务器将根 据所确定的优先级设置展示该文本的推送渠道和内容等级。例如,推送渠道包括实时推送、移动展示、网站展示等,内容等级分为头条级、要闻级、普通级等。这样,客户端根据接收到的优先级指示,通过相应的推送渠道按照所设置的内容等级展示接收到的文本。In this step, the determined priorities correspond to different text display styles, and different priorities correspond to different push channels and content levels. When determining the priority, the server will root The push channel and content level of the text are displayed according to the determined priority settings. For example, the push channel includes real-time push, mobile display, website display, etc. The content level is divided into headline level, news level, and general level. In this way, the client displays the received text according to the set content level through the corresponding push channel according to the received priority indication.
通过上述实施例,根据至少一个对象的行为表现数据以及每个对象的历史表现数据和/或表现期望数据,确定该主题下的至少一个待推送内容项,并生成每个待推送内容项的文本,然后,对于每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级,并将所确定的优先级和该待推送内容项所对应的文本发送给客户端,以使客户端根据优先级展示该文本,提供了一种基于数据异动发掘新闻点的方法,能够区别出不同内容项在新闻角度上的重要性差异,真正实现机器人代替人工挖掘并报道新闻热点,出稿速度快,并且可以做到同时监测多场赛事,批量生产稿件,做到爆冷、爆热的赛事内容也无漏稿,节省了大量人力。最终向用户展示时,按照不同推送渠道推送梳理完毕、有侧重的多个报道。因此,提高了文本展示的效率,真正实现了机器报道的类人化,也提高了服务器的资源利用率。According to the above embodiment, at least one content item to be pushed under the theme is determined according to behavior performance data of at least one object and historical performance data and/or performance expectation data of each object, and text of each content item to be pushed is generated. And then, for each content item to be pushed, determining, according to at least one preset hotness of the theme, a priority of displaying the text corresponding to the content item to be pushed, and determining the determined priority and the content item to be pushed The corresponding text is sent to the client, so that the client displays the text according to the priority, and provides a method for discovering news points based on data transaction, which can distinguish the difference in importance of different content items in the news angle, and truly realize Instead of manually excavating and reporting news hotspots, robots can quickly report multiple events, and can simultaneously produce manuscripts in batches, so that the contents of the events that are hot and hot are not leaked, saving a lot of manpower. When finally showing to the user, multiple reports that have been sorted out and focused are pushed according to different push channels. Therefore, the efficiency of text display is improved, the humanization of machine reports is truly realized, and the resource utilization of the server is also improved.
在步骤1003中,确定至少一个待推送内容项时,将根据对象的行为表现数据、历史表现数据、表现期望数据进行多维的交叉对比,判断是否满足预设的推送条件,计算得到该对象对应的待推送内容项的热度因子。所计算出的热度因子,其数值可正可负。若热度因子取值为正值,表示该热度因子所对应的是爆热新闻点;若热度因子取值为负值,表示该热度因子所对应的是爆冷新闻点。In step 1003, when determining at least one content item to be pushed, multi-dimensional cross comparison is performed according to the behavior performance data, historical performance data, and performance expectation data of the object, and it is determined whether the preset pushing condition is satisfied, and the corresponding object is calculated. The heat factor of the content item to be pushed. The calculated heat factor can be positive or negative. If the heat factor takes a positive value, it means that the heat factor corresponds to the hot news point; if the heat factor takes a negative value, it means that the heat factor corresponds to the hot news point.
其中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,历史表现数据包括与每个行为相对应的历史表现评判数据,表现期望数据包括与每个行为相对应的期望表现评判数据。若 一主题包括I个对象,第i个对象对应了Ji个行为,j=1…Ji,第j个行为对应的当前表现评判数据表示为si,j。在该对象的历史表现数据中第j个行为对应的历史表现评判数据表示为s′i,j,在表现期望数据中第j个行为对应的期望表现评判数据表示为s″i,jThe behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, and the historical performance data includes historical performance evaluation data corresponding to each behavior, and the performance expectation data includes corresponding to each behavior. Expected performance evaluation data. If a topic includes 1 object, the i-th object corresponds to J i behaviors, j=1...J i , and the current performance evaluation data corresponding to the j-th behavior is represented as s i,j . The historical performance evaluation data corresponding to the jth behavior in the historical performance data of the object is expressed as s' i,j , and the expected performance evaluation data corresponding to the jth behavior in the performance expectation data is expressed as s" i,j .
在本发明的一实施例中,将至少一个对象的行为表现数据以及每个对象的历史表现数据进行比较,计算出一个或多个对象的热度因子。In an embodiment of the invention, the behavioral performance data of at least one object and the historical performance data of each object are compared to calculate a heat factor of one or more objects.
具体地,针对历史表现数据,预先设置每个行为的第一热度门限,令s′th,j为针对历史表现数据预设的第j个行为的第一热度门限。Specifically, for the historical performance data, the first heat threshold of each behavior is preset, so that s′ th, j is the first heat threshold of the j-th behavior preset for the historical performance data.
对于每个行为,将与该行为相对应的当前表现评判数据和期望表现评判数据进行比较,计算si,j-s′i,j,j=1…Ji,并判断是否满足如下推送条件,即比较结果的绝对值大于该行为的第一热度门限:For each behavior, compare the current performance evaluation data corresponding to the behavior with the expected performance evaluation data, calculate s i,j -s' i,j ,j=1...J i , and determine whether the following push conditions are satisfied , that is, the absolute value of the comparison result is greater than the first heat threshold of the behavior:
|si,j-s′i,j|>s′th,j         (1)|s i,j -s' i,j |>s' th,j (1)
若满足上述条件,则令第i个对象的第j个行为的第一备选热度因子为:If the above conditions are met, the first candidate heat factor of the jth behavior of the i-th object is:
h′i,j=si,j-s′i,j          (2)h' i,j =s i,j -s' i,j (2)
进一步,预先设置在计算热度因子时每个行为的权重,令αi,j代表第i个对象的第j个行为的权重,用于从新闻热点的角度表征第i个对象的第j个行为的重要程度。例如,游泳比赛中,夺冠热门选手的权重较高,最后一个赛程的权重较高。Further, the weight of each behavior when calculating the heat factor is set in advance, so that α i,j represents the weight of the jth behavior of the i-th object, and is used to represent the j-th behavior of the i-th object from the perspective of the news hotspot. The degree of importance. For example, in a swimming competition, the top players who win the championship have higher weights, and the weight of the last race is higher.
这样,计算得到第i个对象的第一热度因子为:Thus, the first heat factor calculated for the i-th object is:
Figure PCTCN2017101852-appb-000007
Figure PCTCN2017101852-appb-000007
其中,第一热度因子的总数小于等于I。Wherein, the total number of the first heat factors is less than or equal to 1.
在本发明的另一实施例中,将至少一个对象的行为表现数据以及每个对象的表现期望数据进行比较,计算出一个或多个对象的热度因子。 In another embodiment of the invention, the behavioral performance data of at least one object and the performance expectation data of each object are compared to calculate a heat factor of one or more objects.
具体地,针对表现期望数据,预先设置每个行为的第二热度门限,令s″th,j为针对表现期望数据预设的第j个行为的第二热度门限。;Specifically, for the performance expectation data, a second heat threshold of each behavior is preset, so that s′′ th, j is a second heat threshold for the j-th behavior preset by the performance expected data.
对于每个行为,将与该行为相对应的当前表现评判数据和期望表现评判数据进行比较,计算si,j-s″i,j,j=1…Ji,并判断是否满足如下推送条件,即比较结果的绝对值大于该行为的第二热度门限:For each behavior, compare the current performance evaluation data corresponding to the behavior with the expected performance evaluation data, calculate s i,j -s" i,j , j=1...J i , and determine whether the following push conditions are satisfied , that is, the absolute value of the comparison result is greater than the second heat threshold of the behavior:
|si,j-s″i,j|>s″th,j      (4)|s i,j -s′′ i,j |>s′′ th,j (4)
其中,若满足上述条件,则令第i个对象的第j个行为的第二备选热度因子为:Wherein, if the above condition is met, the second candidate heat factor of the jth behavior of the i-th object is:
h″i,j=si,j-s″i,j          (5)h′′ i,j =s i,j -s′′ i,j (5)
进一步,预先设置在计算该热度因子时每个行为的权重,同上,令αi,j代表第i个对象的第j个行为的权重。这样,计算得到第i个对象的第二热度因子为:Further, the weight of each behavior when calculating the heat factor is set in advance, and as above, let α i,j represent the weight of the j-th behavior of the i-th object. Thus, the second heat factor of the i-th object is calculated as:
Figure PCTCN2017101852-appb-000008
Figure PCTCN2017101852-appb-000008
其中,第二热度因子的总数小于等于I。Wherein, the total number of the second heat factors is less than or equal to 1.
在本发明的又一实施例中,将至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据进行比较,计算出一个或多个对象的热度因子。In still another embodiment of the present invention, behavioral performance data of at least one object and historical performance data of each object are compared with performance expectation data to calculate a heat factor of one or more objects.
图11为依据本发明一实施例的计算热度因子的示例性流程图。如图11所示,包括如下步骤:11 is an exemplary flow chart for calculating a heat factor in accordance with an embodiment of the present invention. As shown in FIG. 11, the following steps are included:
步骤1101,预先设置在计算热度因子时历史表现数据和表现期望数据各自的第一权重,并预先设置在计算一对象的热度因子时每个行为的第二权重。Step 1101, a first weight of each of the historical performance data and the performance desired data when calculating the heat factor is set in advance, and a second weight of each behavior when calculating the heat factor of an object is preset.
令β′和β″分别表示历史表现数据和表现期望数据在计算热度因子时各自的权重。令αi,j代表第i个对象的第j个行为的权重。 Let β' and β′′ denote the respective weights of the historical performance data and the representation desired data when calculating the heat factor. Let α i,j represent the weight of the j-th behavior of the i-th object.
步骤1102,针对历史表现数据,预先设置每个行为的第一热度门限s′th,j,j=1…Ji Step 1102, for the historical performance data, preset a first heat threshold s' th,j , j=1...J i of each behavior.
步骤1103,针对表现期望数据,预先设置每个行为的第二热度门限s″th,j,j=1…Ji Step 1103, for the performance expectation data, the second heat threshold s′′ th,j , j=1...J i of each behavior is preset.
步骤1104,针对该对象的每个行为,将与该行为相对应的当前表现评判数据和历史表现评判数据进行比较,即计算si,j-s′i,jStep 1104: Compare, for each behavior of the object, current performance evaluation data corresponding to the behavior with historical performance evaluation data, that is, calculate s i,j -s' i,j .
步骤1105,针对该对象的每个行为,将与该行为相对应的当前表现评判数据和期望表现评判数据进行比较,即计算si,j-s″i,jStep 1105: Compare, for each behavior of the object, current performance evaluation data corresponding to the behavior with expected performance evaluation data, that is, calculate s i,j -s′′ i,j .
步骤1106,判断比较结果的绝对值大于该行为的第一热度门限,参见上述公式(1)。若是,执行步骤1108;否则,返回步骤1104针对下一个行为进行比较,即令j=j+1。In step 1106, it is determined that the absolute value of the comparison result is greater than the first heat threshold of the behavior, as shown in the above formula (1). If yes, go to step 1108; otherwise, return to step 1104 to compare the next behavior, ie, let j=j+1.
步骤1107,判断比较结果的绝对值大于该行为的第二热度门限,参见上述公式(4)。若是,执行步骤1109;否则,返回步骤1105针对下一个行为进行比较,即令j=j+1。 Step 1107, determining that the absolute value of the comparison result is greater than the second heat threshold of the behavior, see equation (4) above. If yes, go to step 1109; otherwise, return to step 1105 for a comparison of the next behavior, ie let j=j+1.
步骤1108,将该比较结果作为第一备选热度因子h′i,j Step 1108, the comparison result is taken as the first candidate heat factor h' i,j .
步骤1109,将该比较结果作为第二备选热度因子h″i,j Step 1109, the comparison result is taken as a second candidate heat factor h" i,j .
步骤1110,根据第一权重和第二权重将该对象的所有第一备选热度因子和第二备选热度因子进行加权求和,得到该对象的热度因子。Step 1110: Perform weighted summation of all first candidate heat factors and second candidate heat factors of the object according to the first weight and the second weight to obtain a heat factor of the object.
在分别得到第一备选热度因子和第二备选热度因子后,进一步,计算得到第i个对象的总热度因子为:After obtaining the first candidate heat factor and the second candidate heat factor, respectively, further, the total heat factor of the i-th object is calculated as:
Figure PCTCN2017101852-appb-000009
Figure PCTCN2017101852-appb-000009
此外,在上述步骤1004中,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级,具体包括如下处理:In addition, in the foregoing step 1004, the priority of displaying the text corresponding to the content item to be pushed is determined according to at least one preset hotness of the theme, and specifically includes the following processing:
(1)确定每项推送热度所对应的数值。 (1) Determine the value corresponding to each push heat.
若该主题包括K项推送热度,确定每项推送热度θk(k=1,…K)所对应的数值
Figure PCTCN2017101852-appb-000010
If the subject includes the K item push heat, determine the value corresponding to each push heat θ k (k=1,...K)
Figure PCTCN2017101852-appb-000010
(2)将该待推送内容项的热度因子与数值中的最大值相乘,确定出该热门因子的优先级评分。(2) Multiplying the heat factor of the content item to be pushed by the maximum value in the value to determine the priority score of the hot factor.
若该主题下热度因子的总数为M,将热度因子hm(m=1,…M)与数值中的最大值相乘,确定出该热度因子的优先级评分为:If the total number of heat factors under the theme is M, multiply the heat factor h m (m=1, . . . M) by the maximum value in the value to determine the priority score of the heat factor as:
Figure PCTCN2017101852-appb-000011
Figure PCTCN2017101852-appb-000011
例如,推送热度包括以下四项内容:该主题的热门度、该主题下至少一个对象的热门度、该主题所涉及设备/技术的成熟度、该主题所涉及环境/场地。服务器预先确定该主题下每项推送热度所对应的数值,然后取出最大值,即确定出最热的推送热度项对热度因子进行修正,从而得到优先级评分。这样,即使热度因子很高,但是推送热度项的最大值比较低,那么该待推送内容项的优先级也许并不高。例如,对于冷门的飞碟比赛,即使某个运动拿到了奖牌,实现了奖牌零的突破,由于该赛事具备比较低的关注度,那么推送该内容项时的优先级仍比较低。For example, the push heat includes the following four items: the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/technology involved in the topic, and the environment/site in which the topic relates. The server predetermines the value corresponding to each push heat under the topic, and then takes out the maximum value, that is, determines the hottest push heat item to correct the heat factor, thereby obtaining the priority score. Thus, even if the heat factor is high, but the maximum value of the push heat item is relatively low, the priority of the item to be pushed may not be high. For example, for an unpopular UFO game, even if a sport receives a medal, it achieves a zero breakthrough in the medal. Since the event has a low degree of attention, the priority of pushing the content item is still low.
表5示出了根据本发明一实施例的推送热度数值示例。可见,针对男子400米自由泳决赛这一赛事,最大值25对应了孙杨的热门度。Table 5 shows an example of the push heat value according to an embodiment of the present invention. It can be seen that for the men's 400m freestyle final, the maximum score of 25 corresponds to the popularity of Sun Yang.
Figure PCTCN2017101852-appb-000012
Figure PCTCN2017101852-appb-000012
表5 推送热度数值示例Table 5 Example of push heat value
(3)根据优先级评分和预设的多个取值区间进行对比,确定该所待推送内容项所对应文本的优先级。(3) comparing the priority score with the preset plurality of value intervals to determine the priority of the text corresponding to the content item to be pushed.
表6为根据本发明一实施例的优先级取值区间示例。其中,优先级共分为三个区间,通过设置数值Pmax实现区间之间的划分。每个优先级对应了某个推送渠道和内容等级。例如,优先级为1的待推送内容项,可以使用实时推送的渠道,并按照头条级进行展示。Table 6 is an example of a priority value interval according to an embodiment of the present invention. Among them, the priority is divided into three sections, and the division between the sections is realized by setting the value P max . Each priority corresponds to a push channel and content level. For example, a content item to be pushed with a priority of 1 can use a real-time push channel and display it according to the headline level.
Figure PCTCN2017101852-appb-000013
Figure PCTCN2017101852-appb-000013
表6 优先级取值区间示例Table 6 Example of priority value interval
图12为依据本发明一实施例的所生成文本的示意图。通过优先级判定,确认该文本属于“冷门爆热”的新闻点,如图12所示,在方框1210所示的题目“跆拳道男子58公斤级决赛赵帅创历史夺冠”中加入“#冷门爆热#”的标识,并且在方框1260中显示优先级判定的结果,“#优先级判定#实时推送,头条级”。此外,在方框1220ˉ1250中分别显示该文本的生成时间、摘要、赛事焦点和精彩回放的段落。Figure 12 is a schematic illustration of generated text in accordance with an embodiment of the present invention. By the priority judgment, it is confirmed that the text belongs to the "unpopular hot news" news point, as shown in FIG. 12, the "#不冷" is added to the title "Taekwondo men's 58 kg class finals Zhao Shuai Chuang history wins" shown in block 1210. The flag of Hotspot #" is displayed, and the result of the priority decision is displayed in block 1260, "#Priority Decision #Real Time Push, Header Level". In addition, the generation time, summary, event focus, and highlight playback passages of the text are displayed in blocks 1220ˉ1250, respectively.
图13为依据本发明另一实施例的文本展示方法的示例性流程图。该方法应用于服务器。如图13所示,该方法可包括如下步骤:FIG. 13 is an exemplary flowchart of a text display method in accordance with another embodiment of the present invention. This method is applied to the server. As shown in FIG. 13, the method may include the following steps:
步骤1301,获取针对一主题的至少一个对象的行为表现数据。 Step 1301: Obtain performance performance data of at least one object for a topic.
步骤1302,根据至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据,确定该主题下的至少一个待推送内容项,并生成每个待推送内容项的文本。Step 1302: Determine at least one content item to be pushed under the topic according to behavior performance data of at least one object and historical performance data and performance expectation data of each object, and generate text of each content item to be pushed.
例如,以举重比赛意外丢金报道为例,在里约奥运男子举重77公斤级比赛中,中国选手吕小军虽然抓举成绩打破世界纪录,但哈萨克斯坦选手阿西莫夫最后一把挺举成功,逆转夺金,吕小军卫冕失败。可见,通过两个运动员的行为表现数据以及其中一个运动员的表现期望数据,确定出该赛事中中国选手吕小军的表现为热门爆冷的新闻点。For example, in the case of the heavyweight game accidental gold loss report, in the Rio Olympic men's weightlifting 77 kg class, Chinese player Lu Xiaojun broke the world record despite the snatch score, but Kazakhstan's Asimov finally succeeded in jerk and reversed. Kim, Lu Xiaojun defended the defeat. It can be seen that through the performance data of the two athletes and the performance expectation data of one of the athletes, it is determined that the performance of the Chinese player Lu Xiaojun in the event is a hot news point.
步骤1303,对于每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级。In step 1303, for each content item to be pushed, the priority of displaying the text corresponding to the content item to be pushed is determined according to at least one preset hotness of the topic.
在确定优先级时,服务器还可以根据不同用户人群对赛事战果资讯需求的强烈程度,为用户设置不同的优先级与推送渠道和内容等级之间的对应关系。此时,客户端将向服务器上报用户所指定的分配方式。例如,某个用户为体育迷,对于上述表6中所示的优先级1-3,其推送渠道都被设置为实时推送。When determining the priority, the server can also set the correspondence between the different priorities and the push channel and the content level according to the strong degree of the information demand of the different user groups. At this point, the client will report the allocation method specified by the user to the server. For example, a user is a sports fan, and for the priority 1-3 shown in Table 6 above, the push channel is set to be pushed in real time.
步骤1304,对生成的文本进行关键词审核,并将所确定的优先级和审核后的文本发送给客户端,以使客户端根据优先级展示该文本。Step 1304: Perform keyword review on the generated text, and send the determined priority and the audited text to the client, so that the client displays the text according to the priority.
此处的审核包括排查关键词,对风险加权级别较高的稿件还可以提交到人工审核窗口进行审核。The review here includes troubleshooting keywords, and manuscripts with a higher risk-weighted level can also be submitted to the manual review window for review.
客户端根据接收到的优先级指示,通过相应的推送渠道按照所设置的内容等级展示接收到的文本。对于不同的内容等级,客户端将设置相应的展示位。例如,当内容等级为头条级时,客户端将在推荐展示界面的头条位置设置一展示位,该展示位可以以链接的方式仅显示文本的部分内容,完整的文本可以在用户点击链接后在另一页面中展示。The client displays the received text according to the set content level through the corresponding push channel according to the received priority indication. For different content levels, the client will set the corresponding display bit. For example, when the content level is headline level, the client will set a display position in the headline position of the recommended display interface, and the display bit can display only part of the text in a linked manner, and the complete text can be after the user clicks on the link. Shown on another page.
图14为依据本发明一实施例的展示文本的示意图。客户端按照头条 级向用户实时推送了一篇“热门爆冷”的体育赛事报道,在推荐展示页面1400中,方框1410显示有“头条要闻”的界面标识,在方框1420中给出题目为“双杠尤浩重大失误仅第8邓书弟第四无缘奖牌”,并且题目之前加入“#热门爆冷#”的提醒字样,在方框1430中显示推广方为“腾讯体育”,日期“2016-08-17”,报道的实时推送时间为“01:28”,并且在方框1440中给出正文的部分内容和浏览全文的链接“更多>”。FIG. 14 is a schematic diagram showing text displayed in accordance with an embodiment of the present invention. The client pushes a "hot upset" sports event report to the user in real time according to the headline level. In the recommended display page 1400, the box 1410 displays the "headline news" interface identifier, and the title is given in block 1420. For the "double bar, You Hao's major mistakes only the 8th Deng Shudi fourth missed medal", and the title of the "# hot upset #" reminder was added before the title, in block 1430, the promotion is "Tencent Sports", date "2016- 08-17", the real-time push time of the report is "01:28", and in block 1440, part of the body text and a link to the full text "more>" are given.
图15为依据本发明另一实施例的展示文本的示意图。对应于图14所示的头条要闻,在显示界面1500中展示了该报道的全文。在方框1520中还提供“评论”选项(见1521)和“分享”选项(见1522)以供用户在社交平台上进行互动。在方框1530中示出了该爆冷新闻的图片,并在方框1540中给出了图片的标题以突出该爆冷新闻的主题。此外,在方框1550中给出了该报道的详细正文,并且在方框1560中给出了报道的附录“选手资料”。15 is a schematic diagram showing text displayed in accordance with another embodiment of the present invention. Corresponding to the headline news shown in FIG. 14, the full text of the report is displayed in the display interface 1500. A "Comment" option (see 1521) and a "Share" option (see 1522) are also provided in block 1520 for the user to interact on the social platform. A picture of the upset news is shown in block 1530, and the title of the picture is given in block 1540 to highlight the subject of the upset news. Additionally, the detailed body of the story is given in block 1550, and the appendix "player profile" of the story is given in block 1560.
根据本发明实施例提供的文本展示方法,使得机器人能够挖掘出爆冷或者爆热的新闻热点,并且出稿速度快。例如,对于奥运比赛的报道,每个比赛从结束到数据获取和文本生成,再到战报稿件链接发出,比人工快讯平均快30秒,比详细报道至少快5-10分钟。According to the text display method provided by the embodiment of the present invention, the robot can excavate a hot news hot spot that is hot or hot, and the draft speed is fast. For example, for the Olympic Games report, each game from the end to the data acquisition and text generation, and then to the war report link, issued an average of 30 seconds faster than the manual news, at least 5-10 minutes faster than the detailed report.
其次,应用上述方法给出的技术方案,可批量生产稿件,同时监测百场赛事,做到冷门赛事也无漏稿。尤其是体育赛事,往往电视直播因播放频道有限,收视率因赛事有明显差距,只能截取个别赛事转播,并且许多赛事并没有本国选手参加,或者是本国选手没有实力进入决赛,并不会报道。但实际上,这种的赛事还是会有用户关注。由于人工编辑的精力和条件有限,无法及时报道,而机器人只要获取到赛事的数据就可以进行实时、快速的报道。Secondly, using the technical solution given by the above method, the manuscript can be produced in batches, and hundreds of events can be monitored at the same time, so that there is no leakage of the unpopular events. Especially for sports events, TV broadcasts often have limited broadcast channels. Due to the obvious gap between the ratings, only the individual events can be intercepted, and many of the events do not have national players, or the domestic players do not have the strength to enter the finals. . But in fact, this kind of event will still have users' attention. Due to the limited energy and conditions of manual editing, it is impossible to report in time, and the robot can report in real time and quickly as long as the data of the event is obtained.
此外,还可以节省大量的人力,可持续性强。例如,***一共有 28个大项目,306个小项目。机器人可以替代各个环节的所有工作,从数据库里精准获取核心数据、自动写稿发文、生成链接前端展示、推送一步到位,仅需要很少的管理人员即可。In addition, it can save a lot of manpower and is sustainable. For example, the Olympic Games have a total 28 large projects, 306 small projects. The robot can replace all the work in each link, get the core data accurately from the database, automatically write the manuscript, generate the link front-end display, push one step in place, only a small number of managers can be needed.
图16为依据本发明另一实施例的服务器的结构示意图。如图16所示,服务器1600包括:FIG. 16 is a schematic structural diagram of a server according to another embodiment of the present invention. As shown in FIG. 16, the server 1600 includes:
获取模块1610,用于获取针对一主题的代码数据,所述代码数据按照机器语言编写、携带有该主题下的内容数据;从所述内容数据中识别出至少一个对象的行为表现数据;The obtaining module 1610 is configured to obtain code data for a topic, the code data is written in a machine language, carrying content data under the topic, and the behavior performance data of the at least one object is identified from the content data;
第一确定模块1620,用于根据获取模块1610获取到的至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据,确定该主题下的至少一个待推送内容项;The first determining module 1620 is configured to determine, according to the behavior performance data of the at least one object acquired by the obtaining module 1610 and the historical performance data and the performance expectation data of each object, at least one content item to be pushed under the theme;
生成模块1630,用于生成第一确定模块1620确定的每个待推送内容项的文本;a generating module 1630, configured to generate text of each content item to be pushed determined by the first determining module 1620;
第二确定模块1640,用于对于第一确定模块1620确定的每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级;及,The second determining module 1640 is configured to determine, for each of the to-be-pushed content items determined by the first determining module 1620, a priority of displaying the text corresponding to the content item to be pushed according to at least one preset hotness of the topic; and ,
发送模块1650,用于将第二确定模块1640确定的优先级和生成模块1630生成的该待推送内容项所对应的文本发送给客户端,以使客户端根据优先级展示该文本。The sending module 1650 is configured to send the priority determined by the second determining module 1640 and the text corresponding to the to-be-pushed content item generated by the generating module 1630 to the client, so that the client displays the text according to the priority.
在一实施例中,第一确定模块1620包括:In an embodiment, the first determining module 1620 includes:
判定单元1621,用于针对至少一个对象中的每个对象,根据该对象的行为表现数据以及历史表现数据和表现期望数据,确定该对象对应的内容项是否满足推送条件;The determining unit 1621 is configured to determine, for each object of the at least one object, whether the content item corresponding to the object satisfies a pushing condition according to the behavior performance data of the object and the historical performance data and the performance expectation data;
计算单元1622,用于若判定单元1621确定出满足推送条件,则确定该对象对应的内容项为待推送的,并计算得到该对象对应的待推送内 容项的热度因子。The calculating unit 1622 is configured to: if the determining unit 1621 determines that the pushing condition is met, determine that the content item corresponding to the object is to be pushed, and calculate the to-be-pushed corresponding to the object. The heat factor of the tolerance.
在一实施例中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,历史表现数据包括与每个行为相对应的历史表现评判数据,In an embodiment, the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, and the historical performance data includes historical performance evaluation data corresponding to each behavior,
判定单元1621用于,针对历史表现数据,预先设置每个行为的第一热度门限;针对该对象的每个行为,将与该行为相对应的当前表现评判数据和历史表现评判数据进行比较,若比较结果的绝对值大于该行为的第一热度门限,则将该比较结果作为第一备选热度因子;The determining unit 1621 is configured to preset a first heat threshold for each behavior for the historical performance data, and compare the current performance evaluation data corresponding to the behavior with the historical performance evaluation data for each behavior of the object, if If the absolute value of the comparison result is greater than the first heat threshold of the behavior, the comparison result is used as the first candidate heat factor;
计算单元1622用于,预先设置在计算该热度因子时每个行为的权重;根据权重将该对象的所有第一备选热度因子进行加权求和,得到该热度因子。The calculating unit 1622 is configured to preset a weight of each behavior when calculating the heat factor; and weighting and summing all the first candidate heat factors of the object according to the weight to obtain the heat factor.
在一实施例中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,表现期望数据包括与每个行为相对应的期望表现评判数据,In an embodiment, the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, and the performance expectation data includes expected performance evaluation data corresponding to each behavior,
判定单元1621用于,针对表现期望数据,预先设置每个行为的第二热度门限;针对该对象的每个行为,将与该行为相对应的当前表现评判数据和期望表现评判数据进行比较,若比较结果的绝对值大于该行为的第二热度门限,则将该比较结果作为第二备选热度因子;The determining unit 1621 is configured to preset a second heat threshold for each behavior for the performance expectation data, and compare the current performance evaluation data corresponding to the behavior with the expected performance evaluation data for each behavior of the object, if If the absolute value of the comparison result is greater than the second heat threshold of the behavior, the comparison result is used as the second candidate heat factor;
计算单元1622用于,预先设置在计算该热度因子时每个行为的权重;根据权重将该对象的所有第二备选热度因子进行加权求和,得到该热度因子。The calculating unit 1622 is configured to preset a weight of each behavior when calculating the heat factor; and weighting and summing all the second candidate heat factors of the object according to the weight to obtain the heat factor.
在一实施例中,行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,历史表现数据包括与每个行为相对应的历史表现评判数据,表现期望数据包括与每个行为相对应的期望表现评判数据, In an embodiment, the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, the historical performance data including historical performance evaluation data corresponding to each behavior, and performance expectation data including and Expected performance evaluation data corresponding to each behavior,
判定单元1621用于,针对历史表现数据,预先设置每个行为的第一热度门限;针对表现期望数据,预先设置每个行为的第二热度门限;针对该对象的每个行为,执行以下处理:将该行为相对应的当前表现评判数据和历史表现评判数据进行比较,若比较结果的绝对值大于该行为的第一热度门限,则将该比较结果作为第一备选热度因子;将与该行为相对应的当前表现评判数据和期望表现评判数据进行比较,若比较结果的绝对值大于该行为的第二热度门限,则将该比较结果作为第二备选热度因子;The determining unit 1621 is configured to preset a first heat threshold for each behavior for the historical performance data; and set a second heat threshold for each behavior for the performance expectation data; for each behavior of the object, perform the following processing: Comparing the current performance evaluation data corresponding to the behavior with the historical performance evaluation data. If the absolute value of the comparison result is greater than the first heat threshold of the behavior, the comparison result is used as the first candidate heat factor; Comparing the current performance evaluation data with the expected performance evaluation data, and if the absolute value of the comparison result is greater than the second heat threshold of the behavior, the comparison result is used as the second candidate heat factor;
计算单元1622用于,预先设置在计算热度因子时历史表现数据和表现期望数据各自的第一权重,并预先设置在计算该对象的热度因子时每个行为的第二权重;根据第一权重和第二权重将该对象的所有第一备选热度因子和第二备选热度因子进行加权求和,得到该对象的热度因子。The calculating unit 1622 is configured to preset a first weight of each of the historical performance data and the performance expected data when calculating the heat factor, and preset a second weight of each behavior when calculating the heat factor of the object; according to the first weight sum The second weight weights all first candidate heat factors and second candidate heat factors of the object to obtain a heat factor of the object.
在一实施例中,推送热度包括该主题的热门度、该主题下至少一个对象的热门度、该主题所涉及设备/物体的成熟度、该主题所涉及环境/场地的变化程度中的任一项或任意几项,In an embodiment, the push heat includes any of the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/object involved in the topic, and the degree of change of the environment/site involved in the topic. Item or any number of items,
第二确定模块1640用于,确定每项推送热度所对应的数值;将该待推送内容项的热度因子与数值中的最大值相乘,确定出该热门因子的优先级评分;根据优先级评分和预设的多个取值区间进行对比,确定该所待推送内容项所对应文本的优先级。The second determining module 1640 is configured to determine a value corresponding to each push heat; multiply the heat factor of the content item to be pushed by a maximum value in the value to determine a priority score of the hot factor; and score according to the priority And comparing the preset multiple value intervals to determine the priority of the text corresponding to the content item to be pushed.
在一实施例中,第二确定模块1640用于,根据所确定的优先级设置展示该文本的推送渠道和内容等级,以使客户端通过推送渠道按照所设置的内容等级展示该文本。In an embodiment, the second determining module 1640 is configured to display a push channel and a content level of the text according to the determined priority level, so that the client displays the text according to the set content level by using a push channel.
图17为依据本发明又一实施例的服务器的结构示意图。该服务器1700可包括:处理器1710、存储器1720、端口1730以及总线1740。 处理器1710和存储器1720通过总线1740互联。处理器1710可通过端口1730接收和发送数据。其中,FIG. 17 is a schematic structural diagram of a server according to still another embodiment of the present invention. The server 1700 can include a processor 1710, a memory 1720, a port 1730, and a bus 1740. The processor 1710 and the memory 1720 are interconnected by a bus 1740. The processor 1710 can receive and transmit data through the port 1730. among them,
处理器1710用于执行存储器1720存储的机器可读指令模块。The processor 1710 is configured to execute a machine readable instruction module stored by the memory 1720.
存储器1720存储有处理器1710可执行的机器可读指令模块。处理器1710可执行的指令模块包括:获取模块1721、第一确定模块1722、生成模块1723、第二确定模块1724和发送模块1725。其中,The memory 1720 stores machine readable instruction modules executable by the processor 1710. The instruction module executable by the processor 1710 includes an acquisition module 1721, a first determination module 1722, a generation module 1723, a second determination module 1724, and a transmission module 1725. among them,
获取模块1721被处理器1710执行时可以为:获取针对一主题的代码数据,所述代码数据按照机器语言编写、携带有该主题下的内容数据;从所述内容数据中识别出至少一个对象的行为表现数据;The obtaining module 1721, when executed by the processor 1710, may be: acquiring code data for a topic, the code data is written in a machine language, carrying content data under the topic; and identifying at least one object from the content data Behavioral performance data;
第一确定模块1722被处理器1710执行时可以为:根据获取模块1721获取到的至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据,确定该主题下的至少一个待推送内容项;When the first determining module 1722 is executed by the processor 1710, the at least one to be pushed under the theme may be determined according to the behavior performance data of the at least one object acquired by the obtaining module 1721 and the historical performance data and the performance expectation data of each object. Content item
生成模块1723被处理器1710执行时可以为:生成第一确定模块1722确定的每个待推送内容项的文本;The generating module 1723 may be executed by the processor 1710 to: generate text of each content item to be pushed determined by the first determining module 1722;
第二确定模块1724被处理器1710执行时可以为:对于第一确定模块1722确定的每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级;When the second determining module 1724 is executed by the processor 1710, the content of the to-be-push content item determined by the first determining module 1722 is determined according to at least one preset hotness of the topic. The priority of the text;
发送模块1725被处理器1710执行时可以为:将第二确定模块1724确定的优先级和生成模块1723生成的该待推送内容项所对应的文本发送给客户端,以使客户端根据优先级展示该文本。When the sending module 1725 is executed by the processor 1710, the priority determined by the second determining module 1724 and the text corresponding to the to-be-pushed content item generated by the generating module 1723 are sent to the client, so that the client displays according to the priority. The text.
由此可以看出,当存储在存储器1720中的指令模块被处理器1710执行时,可实现前述各个实施例中获取模块、第一确定模块、生成模块、第二确定模块和发送模块的各种功能。It can be seen that when the instruction module stored in the memory 1720 is executed by the processor 1710, various components of the acquisition module, the first determination module, the generation module, the second determination module, and the transmission module in the foregoing various embodiments can be implemented. Features.
上述装置和***实施例中,各个模块及单元实现自身功能的具体方法在方法实施例中均有描述,这里不再赘述。 In the foregoing apparatus and system embodiments, specific methods for implementing functions of the respective modules and units are described in the method embodiments, and details are not described herein again.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
另外,本发明的每一个实施例可以通过由数据处理设备如计算机执行的数据处理程序来实现。显然,数据处理程序构成了本发明。此外,通常存储在一个存储介质中的数据处理程序通过直接将程序读取出存储介质或者通过将程序安装或复制到数据处理设备的存储设备(如硬盘和或内存)中执行。因此,这样的存储介质也构成了本发明。存储介质可以使用任何类型的记录方式,例如纸张存储介质(如纸带等)、磁存储介质(如软盘、硬盘、闪存等)、光存储介质(如CD-ROM等)、磁光存储介质(如MO等)等。Additionally, each of the embodiments of the present invention can be implemented by a data processing program executed by a data processing device such as a computer. Obviously, the data processing program constitutes the present invention. Further, a data processing program usually stored in a storage medium is executed by directly reading a program out of a storage medium or by installing or copying the program to a storage device (such as a hard disk and or a memory) of the data processing device. Therefore, such a storage medium also constitutes the present invention. The storage medium can use any type of recording method, such as paper storage medium (such as paper tape, etc.), magnetic storage medium (such as floppy disk, hard disk, flash memory, etc.), optical storage medium (such as CD-ROM, etc.), magneto-optical storage medium ( Such as MO, etc.).
因此,本发明实施例还公开了一种存储介质,其中存储有数据处理程序,该数据处理程序用于执行本发明上述方法的任何一种实施例。Accordingly, embodiments of the present invention also disclose a storage medium in which is stored a data processing program for performing any of the above-described embodiments of the present invention.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are made within the spirit and principles of the present invention, should be included in the present invention. Within the scope of protection.

Claims (29)

  1. 一种文本生成方法,其特征在于,所述方法应用于服务器,所述服务器包括处理器和存储器,所述方法由所述处理器执行存储在所述存储器中的指令来执行,包括:A text generating method, the method being applied to a server, the server comprising a processor and a memory, the method being executed by the processor executing an instruction stored in the memory, comprising:
    获取针对一主题的代码数据,所述代码数据按照机器语言编写、携带有该主题下的内容数据;Obtaining code data for a subject, the code data being written in a machine language and carrying content data under the subject;
    从所述内容数据中识别出至少一个对象的行为表现数据;及,Identifying behavioral performance data of at least one object from the content data; and,
    根据所述至少一个对象的行为表现数据,生成该主题的文本。Generating the text of the subject based on the behavioral performance data of the at least one object.
  2. 根据权利要求1所述的方法,进一步包括:The method of claim 1 further comprising:
    根据代码的编写规则设置所述代码数据与对象、行为、当前表现评判数据之间的映射关系;Setting a mapping relationship between the code data and an object, a behavior, and current performance evaluation data according to a code writing rule;
    其中,所述行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,所述从所述代码数据中识别出至少一个对象的行为表现数据包括:The behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior, and the behavioral performance data identifying the at least one object from the code data includes:
    根据所述映射关系识别出每个对象的行为以及当前表现评判数据。The behavior of each object and the current performance evaluation data are identified based on the mapping relationship.
  3. 根据权利要求1所述的方法,其中,所述根据所述至少一个对象的行为表现数据,生成该主题的文本包括:The method of claim 1, wherein the generating the text of the subject according to the behavioral performance data of the at least one object comprises:
    根据所述至少一个对象的行为表现数据确定出至少一个描述词组;及,Determining at least one description phrase according to behavioral performance data of the at least one object; and
    根据所述至少一个对象的行为表现数据和所述至少一个描述词组,生成该主题的文本。Generating the text of the subject based on the behavioral performance data of the at least one object and the at least one description phrase.
  4. 根据权利要求3所述的方法,其中,所述根据所述至少一个对象的行为表现数据确定出至少一个描述词组包括:The method of claim 3, wherein the determining the at least one description phrase based on the behavioral performance data of the at least one object comprises:
    将该主题下多个对象的行为表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组。 The behavior data of the plurality of objects under the theme is compared, and the at least one description phrase matching the comparison result is selected from the preset description phrases.
  5. 根据权利要求3所述的方法,其中,所述根据所述至少一个对象的行为表现数据确定出至少一个描述词组包括:The method of claim 3, wherein the determining the at least one description phrase based on the behavioral performance data of the at least one object comprises:
    针对每个对象,获取该对象的历史表现数据;Obtain historical performance data of the object for each object;
    将该对象的所述行为表现数据和所述历史表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组。Comparing the behavioral performance data of the object with the historical performance data, and selecting the at least one description phrase that matches the comparison result from the preset description phrases.
  6. 根据权利要求5所述的方法,其中,所述从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组包括:The method of claim 5, wherein the selecting the at least one description phrase that matches the comparison result from the preset description phrases comprises:
    将所述行为表现数据和所述历史表现数据按多个数据类型分别进行比较,从所述多个数据类型的多个比较结果中筛选出具备展示价值的比较结果,从预设的描述词组中选择出与所述具备展示价值的比较结果相匹配的所述至少一个描述词组。Comparing the behavioral performance data and the historical performance data according to a plurality of data types, respectively, and comparing the comparison results with the display value from the plurality of comparison results of the plurality of data types, from the preset description phrases The at least one description phrase that matches the comparison result with the display value is selected.
  7. 根据权利要求3所述的方法,其中,所述根据所述至少一个对象的行为表现数据确定出至少一个描述词组包括:The method of claim 3, wherein the determining the at least one description phrase based on the behavioral performance data of the at least one object comprises:
    针对每个对象,获取该对象的表现期望数据;Obtaining performance expectation data of the object for each object;
    将该对象的所述行为表现数据和所述表现期望数据进行比较,从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组。Comparing the behavioral performance data of the object with the performance expectation data, and selecting the at least one description phrase that matches the comparison result from the preset description phrases.
  8. 根据权利要求3至7中任一项所述的方法,其中,所述根据所述至少一个对象的行为表现数据和所述至少一个描述词组,生成该主题的文本包括:The method according to any one of claims 3 to 7, wherein the generating the text of the subject according to the behavioral performance data of the at least one object and the at least one description phrase comprises:
    在预设的语料数据库中为每个描述词组选择衔接词;Select a conjunction for each description phrase in the default corpus database;
    将所述至少一个对象的行为表现数据、所述衔接词和所述至少一个描述词组连接成至少一个短句;Linking behavioral performance data, the articulation words, and the at least one description phrase of the at least one object into at least one short sentence;
    将所述至少一个短句组合成至少一个段落,连接所述至少一个段落得到所述文本。Combining the at least one short sentence into at least one paragraph, connecting the at least one paragraph to obtain the text.
  9. 根据权利要求8所述的方法,进一步包括: The method of claim 8 further comprising:
    预先设置多种类型的段落模板;Preset multiple types of paragraph templates;
    其中,所述将所述至少一个短句组合成至少一个段落,连接所述至少一个段落得到所述文本包括:Wherein the combining the at least one short sentence into at least one paragraph, and connecting the at least one paragraph to obtain the text comprises:
    从所述多种类型的段落模板中选择出所述文本所包括的至少一个待使用段落模板;Selecting at least one to-be-used paragraph template included in the text from the plurality of types of paragraph templates;
    针对每个待使用段落模板,从所述至少一个短句中确定与该待使用段落模板相匹配的至少一个短句,对所确定的至少一个短句进行组合得到与该待使用段落模板对应的段落。Determining, from each of the at least one short sentence, at least one short sentence matching the to-be-used paragraph template for each paragraph template to be used, and combining the determined at least one short sentence to obtain a corresponding to the to-be-used paragraph template paragraph.
  10. 根据权利要求1所述的方法,其中,所述根据所述至少一个对象的行为表现数据,生成该主题的文本包括:The method of claim 1, wherein the generating the text of the subject according to the behavioral performance data of the at least one object comprises:
    根据所述至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据,确定该主题下的至少一个待推送内容项,并生成每个待推送内容项的文本。And determining, according to the behavioral performance data of the at least one object and the historical performance data and the performance expectation data of each object, at least one content item to be pushed under the theme, and generating text of each content item to be pushed.
  11. 根据权利要求10所述的方法,其中,所述确定该主题下的至少一个待推送内容项包括:The method of claim 10, wherein the determining at least one item to be pushed under the topic comprises:
    针对所述至少一个对象中的每个对象,根据该对象的所述行为表现数据以及所述历史表现数据和所述表现期望数据,确定该对象对应的内容项是否满足推送条件;Determining, according to the behavioral performance data of the object, the historical performance data and the performance expectation data, whether the content item corresponding to the object satisfies a pushing condition for each of the at least one object;
    若满足所述推送条件,则确定该对象对应的内容项为待推送的,并计算得到该对象对应的所述待推送内容项的热度因子。If the push condition is met, determining that the content item corresponding to the object is to be pushed, and calculating a heat factor of the item to be pushed corresponding to the object.
  12. 根据权利要求11所述的方法,其中,所述行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,所述历史表现数据包括与每个行为相对应的历史表现评判数据,The method of claim 11 wherein said behavioral performance data comprises one or more behaviors and current performance assessment data corresponding to each behavior, said historical performance data comprising historical performance corresponding to each behavior Judging data,
    所述根据该对象的所述行为表现数据以及所述历史表现数据,确定该对象对应的内容项是否满足推送条件包括: Determining whether the content item corresponding to the object satisfies the pushing condition according to the behavior performance data of the object and the historical performance data includes:
    针对所述历史表现数据,预先设置每个行为的第一热度门限;Determining, for the historical performance data, a first popularity threshold of each behavior;
    针对该对象的每个行为,将与该行为相对应的所述当前表现评判数据和所述历史表现评判数据进行比较,若比较结果的绝对值大于该行为的所述第一热度门限,则将该比较结果作为第一备选热度因子;Comparing the current performance evaluation data corresponding to the behavior with the historical performance evaluation data for each behavior of the object, and if the absolute value of the comparison result is greater than the first popularity threshold of the behavior, The comparison result is used as a first alternative heat factor;
    所述计算得到该对象对应的所述待推送内容项的热度因子包括:The calculating the heat factor of the item to be pushed corresponding to the object includes:
    预先设置在计算该热度因子时每个行为的权重;Presetting the weight of each behavior when calculating the heat factor;
    根据所述权重将该对象的所有第一备选热度因子进行加权求和,得到该热度因子。All the first candidate heat factors of the object are weighted and summed according to the weights to obtain the heat factor.
  13. 根据权利要求11所述的方法,其中,所述行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,所述表现期望数据包括与每个行为相对应的期望表现评判数据,The method of claim 11 wherein said behavioral performance data comprises one or more behaviors and current performance assessment data corresponding to each behavior, said performance expectation data comprising a desired performance corresponding to each behavior Judging data,
    所述根据该对象的所述行为表现数据以及所述表现期望数据,确定该对象对应的内容项是否满足推送条件包括:Determining whether the content item corresponding to the object satisfies the pushing condition according to the behavior performance data of the object and the performance expectation data includes:
    针对所述表现期望数据,预先设置每个行为的第二热度门限;Setting a second heat threshold for each behavior for the performance expectation data;
    针对该对象的每个行为,将与该行为相对应的所述当前表现评判数据和所述期望表现评判数据进行比较,若比较结果的绝对值大于该行为的所述第二热度门限,则将该比较结果作为第二备选热度因子;Comparing the current performance evaluation data corresponding to the behavior with the expected performance evaluation data for each behavior of the object, and if the absolute value of the comparison result is greater than the second popularity threshold of the behavior, The comparison result is used as a second alternative heat factor;
    所述计算得到该对象对应的所述待推送内容项的热度因子包括:The calculating the heat factor of the item to be pushed corresponding to the object includes:
    预先设置在计算该热度因子时每个行为的权重;Presetting the weight of each behavior when calculating the heat factor;
    根据所述权重将该对象的所有第二备选热度因子进行加权求和,得到该热度因子。All the second candidate heat factors of the object are weighted and summed according to the weights to obtain the heat factor.
  14. 根据权利要求11所述的方法,其中,所述行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据,所述历史表现数据包括与每个行为相对应的历史表现评判数据,所述表现期望数据包括与每个行为相对应的期望表现评判数据, The method of claim 11 wherein said behavioral performance data comprises one or more behaviors and current performance assessment data corresponding to each behavior, said historical performance data comprising historical performance corresponding to each behavior Judging data, the performance expectation data including expected performance evaluation data corresponding to each behavior,
    所述根据该对象的所述行为表现数据以及所述历史表现数据和所述表现期望数据,确定该对象对应的内容项是否满足推送条件包括:Determining whether the content item corresponding to the object satisfies the pushing condition according to the behavior performance data of the object and the historical performance data and the performance expectation data includes:
    针对所述历史表现数据,预先设置每个行为的第一热度门限;Determining, for the historical performance data, a first popularity threshold of each behavior;
    针对所述表现期望数据,预先设置每个行为的第二热度门限;Setting a second heat threshold for each behavior for the performance expectation data;
    针对该对象的每个行为,执行以下处理:For each behavior of the object, perform the following processing:
    将该行为相对应的所述当前表现评判数据和所述历史表现评判数据进行比较,若比较结果的绝对值大于该行为的所述第一热度门限,则将该比较结果作为第一备选热度因子;Comparing the current performance evaluation data corresponding to the behavior with the historical performance evaluation data, and if the absolute value of the comparison result is greater than the first heat threshold of the behavior, the comparison result is used as the first candidate heat factor;
    将与该行为相对应的所述当前表现评判数据和所述期望表现评判数据进行比较,若比较结果的绝对值大于该行为的所述第二热度门限,则将该比较结果作为第二备选热度因子;Comparing the current performance evaluation data corresponding to the behavior with the expected performance evaluation data, and if the absolute value of the comparison result is greater than the second heat threshold of the behavior, using the comparison result as a second candidate Heat factor
    所述计算得到该对象对应的所述待推送内容项的热度因子包括:The calculating the heat factor of the item to be pushed corresponding to the object includes:
    预先设置在计算热度因子时所述历史表现数据和所述表现期望数据各自的第一权重,并预先设置在计算该对象的热度因子时每个行为的第二权重;Determining, in advance, a first weight of each of the historical performance data and the performance desired data when calculating a heat factor, and presetting a second weight of each behavior when calculating a heat factor of the object;
    根据所述第一权重和所述第二权重将该对象的所有第一备选热度因子和第二备选热度因子进行加权求和,得到该对象的所述热度因子。And weighting all first candidate heat factors and second candidate heat factors of the object according to the first weight and the second weight to obtain the heat factor of the object.
  15. 根据权利要求10所述的方法,其中,所述生成每个待推送内容项的文本包括:The method of claim 10 wherein said generating text for each content item to be pushed comprises:
    在预设的文本素材数据库中选择出与该待推送内容项相匹配的至少一个描述词组;Selecting at least one description phrase that matches the item to be pushed in the preset text material database;
    将该待推送内容项所涉及至少一个对象的所述行为表现数据、所述历史表现数据和所述表现期望数据以及所述至少一个描述词组,组合成至少一个段落,连接所述至少一个段落得到所述文本。Combining the behavioral performance data, the historical performance data, and the performance expectation data and the at least one description phrase of the at least one object related to the content item to be pushed into at least one paragraph, connecting the at least one paragraph to obtain The text.
  16. 根据权利要求11所述的方法,其中,所述推送热度包括该主题 的热门度、该主题下至少一个对象的热门度、该主题所涉及设备/物体的成熟度、该主题所涉及环境/场地的变化程度中的任一项或任意几项,The method of claim 11 wherein said pushing heat comprises the subject matter Any or any of the popularity of the topic, the popularity of at least one object under the topic, the maturity of the device/object involved in the topic, the degree of change of the environment/site involved in the topic,
    所述根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级包括:Determining, according to the at least one preset push heat of the theme, the priority of displaying the text corresponding to the content item to be pushed includes:
    确定每项推送热度所对应的数值;Determine the value corresponding to each push heat;
    将该待推送内容项的热度因子与所述数值中的最大值相乘,确定出该热门因子的优先级评分;Multiplying a heat factor of the content item to be pushed by a maximum value of the value to determine a priority score of the hot factor;
    根据所述优先级评分和预设的多个取值区间进行对比,确定该所待推送内容项所对应文本的优先级。The priority of the text corresponding to the content item to be pushed is determined according to the priority score and the preset multiple value interval.
  17. 根据权利要求10所述的方法,进一步包括:The method of claim 10 further comprising:
    对于每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级,并将所确定的优先级和该待推送内容项所对应的文本发送给客户端,以使所述客户端根据所述优先级展示该文本。Determining a priority of displaying the text corresponding to the content item to be pushed according to at least one preset push heat of the topic, and determining the priority corresponding to the content item to be pushed corresponding to each content item to be pushed The text is sent to the client to cause the client to display the text according to the priority.
  18. 根据权利要求17所述的方法,其中,所述确定展示该待推送内容项所对应文本的优先级包括:The method of claim 17, wherein the determining the priority of displaying the text corresponding to the item to be pushed comprises:
    根据所确定的优先级设置展示该文本的推送渠道和内容等级,以使所述客户端通过所述推送渠道按照所设置的内容等级展示该文本。The push channel and the content level of the text are displayed according to the determined priority level, so that the client displays the text according to the set content level through the push channel.
  19. 一种服务器,其特征在于,包括处理器和存储器,所述存储器中存储可被所述处理器执行的指令,当执行所述指令时,所述处理器用于:A server, comprising a processor and a memory, wherein the memory stores instructions executable by the processor, the processor for: when executing the instruction, the processor is:
    获取针对一主题的代码数据,所述代码数据按照机器语言编写、携带有该主题下的内容数据;Obtaining code data for a subject, the code data being written in a machine language and carrying content data under the subject;
    从所述内容数据中识别出至少一个对象的行为表现数据;及,Identifying behavioral performance data of at least one object from the content data; and,
    根据所述至少一个对象的行为表现数据,生成该主题的文本。 Generating the text of the subject based on the behavioral performance data of the at least one object.
  20. 根据权利要求19所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:根据代码的编写规则设置所述代码数据与对象、行为、当前表现评判数据之间的映射关系;其中,所述行为表现数据包括一个或多个行为以及与每个行为相对应的当前表现评判数据;根据所述映射关系识别出每个对象的行为以及当前表现评判数据。The server according to claim 19, wherein, when the instruction is executed, the processor is further configured to: set a mapping relationship between the code data and an object, a behavior, and current performance evaluation data according to a writing rule of the code Wherein the behavioral performance data includes one or more behaviors and current performance evaluation data corresponding to each behavior; and the behavior of each object and the current performance evaluation data are identified according to the mapping relationship.
  21. 根据权利要求19所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:The server according to claim 19, wherein when said instruction is executed, said processor is further configured to:
    根据所述至少一个对象的行为表现数据确定出至少一个描述词组;及,Determining at least one description phrase according to behavioral performance data of the at least one object; and
    根据所述至少一个对象的行为表现数据和所述至少一个描述词组,生成该主题的文本。Generating the text of the subject based on the behavioral performance data of the at least one object and the at least one description phrase.
  22. 根据权利要求21所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:针对每个对象,获取该对象的历史表现数据;将该对象的所述行为表现数据和所述历史表现数据进行比较,从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组。The server according to claim 21, wherein, when the instruction is executed, the processor is further configured to: acquire, for each object, historical performance data of the object; and represent the behavior and data of the object The historical performance data is compared, and the at least one description phrase that matches the comparison result is selected from the preset description phrases.
  23. 根据权利要求21所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:针对每个对象,获取该对象的表现期望数据;将该对象的所述行为表现数据和所述表现期望数据进行比较,从预设的描述词组中选择出与比较结果相匹配的所述至少一个描述词组。The server according to claim 21, wherein, when the instruction is executed, the processor is further configured to: obtain, for each object, performance expectation data of the object; and the behavioral performance data and the object of the object The performance expectation data is compared, and the at least one description phrase that matches the comparison result is selected from the preset description phrases.
  24. 根据权利要求21所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:在预设的语料数据库中为每个描述词组选择衔接词;将所述至少一个对象的行为表现数据、所述衔接词和所述至少一个描述词组连接成至少一个短句;将所述至少一个短句组合成至少一个段落,连接所述至少一个段落得到所述文本。The server according to claim 21, wherein, when the instruction is executed, the processor is further configured to: select a conjunction word for each description phrase in a preset corpus database; and act on the at least one object The performance data, the conjunction and the at least one description phrase are concatenated into at least one short sentence; the at least one short sentence is combined into at least one paragraph, and the at least one paragraph is joined to obtain the text.
  25. 根据权利要求19所述的服务器,其中,当执行所述指令时,所 述处理器进一步用于:The server according to claim 19, wherein when said instruction is executed, The processor is further used to:
    根据所述至少一个对象的行为表现数据以及每个对象的历史表现数据和表现期望数据,确定该主题下的至少一个待推送内容项,并生成每个待推送内容项的文本。And determining, according to the behavioral performance data of the at least one object and the historical performance data and the performance expectation data of each object, at least one content item to be pushed under the theme, and generating text of each content item to be pushed.
  26. 根据权利要求25所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:针对所述至少一个对象中的每个对象,根据该对象的所述行为表现数据以及所述历史表现数据和所述表现期望数据,确定该对象对应的内容项是否满足推送条件;若满足所述推送条件,则确定该对象对应的内容项为待推送的,并计算得到该对象对应的所述待推送内容项的热度因子。The server according to claim 25, wherein, when said instruction is executed, said processor is further configured to: for each of said at least one object, according to said behavioral performance data of said object and said Determining whether the content item corresponding to the object satisfies a push condition; if the push condition is met, determining that the content item corresponding to the object is to be pushed, and calculating the corresponding object of the object The heat factor of the push content item is described.
  27. 根据权利要求25所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:在预设的文本素材数据库中选择出与该待推送内容项相匹配的至少一个描述词组;将该待推送内容项所涉及至少一个对象的所述行为表现数据、所述历史表现数据和所述表现期望数据以及所述至少一个描述词组,组合成至少一个段落,连接所述至少一个段落得到所述文本。The server according to claim 25, wherein, when the instruction is executed, the processor is further configured to: select at least one description phrase matching the item to be pushed in a preset text material database; Combining the behavioral performance data, the historical performance data, and the performance expectation data and the at least one description phrase of the at least one object related to the content item to be pushed into at least one paragraph, connecting the at least one paragraph to obtain The text.
  28. 根据权利要求25所述的服务器,其中,当执行所述指令时,所述处理器进一步用于:对于每个待推送内容项,根据该主题的至少一项预设的推送热度确定展示该待推送内容项所对应文本的优先级,并将所确定的优先级和该待推送内容项所对应的文本发送给客户端,以使所述客户端根据所述优先级展示该文本。The server according to claim 25, wherein, when the instruction is executed, the processor is further configured to: for each content item to be pushed, determine, according to at least one preset push heat of the topic Pushing the priority of the text corresponding to the content item, and transmitting the determined priority and the text corresponding to the content item to be pushed to the client, so that the client displays the text according to the priority.
  29. 一种计算机可读存储介质,其特征在于,存储有计算机可读指令,可以使至少一个处理器执行如权利要求1至18中任一项所述的方法。 A computer readable storage medium, characterized by storing computer readable instructions, which may cause at least one processor to perform the method of any one of claims 1 to 18.
PCT/CN2017/101852 2016-10-21 2017-09-15 Text generation method and server WO2018072577A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201610920326.5 2016-10-21
CN201610920326.5A CN107977367B (en) 2016-10-21 2016-10-21 Text display method and server
CN201610920284.5 2016-10-21
CN201610920284.5A CN107977196B (en) 2016-10-21 2016-10-21 Text generation method and server

Publications (1)

Publication Number Publication Date
WO2018072577A1 true WO2018072577A1 (en) 2018-04-26

Family

ID=62018218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/101852 WO2018072577A1 (en) 2016-10-21 2017-09-15 Text generation method and server

Country Status (1)

Country Link
WO (1) WO2018072577A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222483A (en) * 2021-07-08 2021-08-06 广州半城云信息科技有限公司 Private domain operation quality inspection method and system
CN113536754A (en) * 2020-04-21 2021-10-22 阿里巴巴集团控股有限公司 Text generation method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1629838A (en) * 2003-12-17 2005-06-22 国际商业机器公司 Method, apparatus and system for processing, browsing and information extracting of electronic document
CN103246710A (en) * 2013-04-22 2013-08-14 张经纶 Method and device for automatically generating multimedia travel notes
CN103778235A (en) * 2014-01-26 2014-05-07 北京京东尚科信息技术有限公司 Method and device for processing commodity assessment information
CN105787095A (en) * 2016-03-16 2016-07-20 广州索答信息科技有限公司 Automatic generation method and device for internet news

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1629838A (en) * 2003-12-17 2005-06-22 国际商业机器公司 Method, apparatus and system for processing, browsing and information extracting of electronic document
CN103246710A (en) * 2013-04-22 2013-08-14 张经纶 Method and device for automatically generating multimedia travel notes
CN103778235A (en) * 2014-01-26 2014-05-07 北京京东尚科信息技术有限公司 Method and device for processing commodity assessment information
CN105787095A (en) * 2016-03-16 2016-07-20 广州索答信息科技有限公司 Automatic generation method and device for internet news

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536754A (en) * 2020-04-21 2021-10-22 阿里巴巴集团控股有限公司 Text generation method and device and electronic equipment
CN113222483A (en) * 2021-07-08 2021-08-06 广州半城云信息科技有限公司 Private domain operation quality inspection method and system

Similar Documents

Publication Publication Date Title
Dai et al. Towards diverse and natural image descriptions via a conditional gan
Bao et al. PLATO: Pre-trained dialogue generation model with discrete latent variable
Zhu et al. Visual7w: Grounded question answering in images
US11511436B2 (en) Robot control method and companion robot
Lu et al. Neural baby talk
US20150243279A1 (en) Systems and methods for recommending responses
CN109241255A (en) A kind of intension recognizing method based on deep learning
Taylor et al. The construction of identities in narratives about serious leisure occupations
CN105045857A (en) Social network rumor recognition method and system
CN109284502B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN109284490B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN110489527A (en) Banking intelligent consulting based on interactive voice and handle method and system
Shevchenko et al. Reasoning over vision and language: Exploring the benefits of supplemental knowledge
CN107463698A (en) Method and apparatus based on artificial intelligence pushed information
Potash et al. Towards debate automation: a recurrent model for predicting debate winners
CN112015852A (en) Providing responses in a session about an event
CN113590810B (en) Abstract generation model training method, abstract generation device and electronic equipment
WO2017028706A1 (en) Method and device for live text broadcasting of match
WO2018033066A1 (en) Robot control method and companion robot
US20240177506A1 (en) Method and Apparatus for Generating Captioning Device, and Method and Apparatus for Outputting Caption
WO2018072577A1 (en) Text generation method and server
CN116932733A (en) Information recommendation method and related device based on large language model
CN107977367B (en) Text display method and server
Yu et al. Mining insights from esports game reviews with an aspect-based sentiment analysis framework
CN111767386A (en) Conversation processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17861483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17861483

Country of ref document: EP

Kind code of ref document: A1