CA3165616A1 - Sports news writing method based on natural language, device and electronic equipment - Google Patents

Sports news writing method based on natural language, device and electronic equipment

Info

Publication number
CA3165616A1
CA3165616A1 CA3165616A CA3165616A CA3165616A1 CA 3165616 A1 CA3165616 A1 CA 3165616A1 CA 3165616 A CA3165616 A CA 3165616A CA 3165616 A CA3165616 A CA 3165616A CA 3165616 A1 CA3165616 A1 CA 3165616A1
Authority
CA
Canada
Prior art keywords
event
slot
slots
events
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3165616A
Other languages
French (fr)
Inventor
Jinjuan Zhou
Yi SHEN
Heqiang NI
Kang QI
Shiwen LIANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3165616A1 publication Critical patent/CA3165616A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A sports news writing method and apparatus based on a natural language, and an electronic device. The method comprises the steps of: obtaining a corpus to be processed, an event set, slot positions, and slot position values corresponding to the slot positions; labeling an event template in the corpus according to each event in the event set, the slot positions, and the slot position values; performing weight assignment on each event; encoding the types and number of the slot positions in each event and the event template; screening the events and the event template according to the weight of each event; matching and filling the screened events and event template to generate news content; and reprocessing the news content to obtain final news content. The apparatus uses the method, the diversity of sentence patterns of articles is improved, and the amount of information of the articles is maximized; sports news articles can be efficiently and automatically written, and labor cost investments are reduced.

Description

SPORTS NEWS WRITING METHOD BASED ON NATURAL LANGUAGE, DEVICE
AND ELECTRONIC EQUIPMENT
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of natural language processing technology, and more particularly to a sports news writing method based on a natural language, and corresponding device and electronic equipment.
Description of Related Art
[0002] When the currently available template traversing and matching strategy matches event data and templates, it is required to compare slots with the numbers of slots in the templates on a one-by-one basis, until the proper template is found out. For instance, a piece of data of a goal event reads: {ORG NEU: Newcastle, PER ACT: Schaer, EVEINF LOC FROM: center outside the penalty area, EVEINF BODY: right foot, EVEINF LOC TO: right upper corner of the goal} ¨ this piece of data contains five slots, when templates are matched, it is supposed there are the following templates for goal events:
[0003] {ORG NEU} Get a goal! {PER-ACT} shoot, the ball flies from {EVEINF LOC
TO}
into the goal
[0004] {ORG NEU} Get a goal! {PER-ACT} {EVEINF LOC FROM} {EVEINF BODY}
shoot, the ball draws a perfect arc and flies from {EVEINF LOC TO} into the goal.
[0005] Each template is traversed, a slot set contained in the current template and information concerning the number of times for which each slot appears are calculated according to regular matching, if the slot set of the current template is a subset of a slot set in the data, Date Recue/Date Received 2022-06-22 a template is successfully matched. As can be seen, it is required for the traversing strategy to calculate slot information of the templates each time, and then to perform set operation with slot information of the data, so it is relatively time-consuming; because there is performance requirement on the online system, return is usually effected once matching is successful, and the entire templates are not matched. This brings about another problem, i.e., the first successfully matched template that conforms to the condition is usually not the optimum template, that is to say, types of slots and the number of slots do not satisfy the maximum requirement. Taking the above for example, successful matching is counted when the first template is matched, but use of the first template would prevent the EVEINF LOC FROM and the EVEINF BODY information from being filled, so that information volume of the article as generated is relatively small.
As a consequence, the traversing and matching strategy is not only low in efficiency, but is also inferior in terms of diversity of the matching results.
SUMMARY OF THE INVENTION
[0006] In view of the deficiencies prevalent in the state of the art, one of the objectives of the present application is to provide a sports news writing method based on a natural language, to enhance the diversity of sentence patterns of articles and maximize information volume of articles. The method comprises the following steps:
[0007] obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds;
[0008] marking event templates in the corpus according to each event in the event set, the slots, and the slot values;
[0009] performing weight assignment on each event;
[0010] coding each event, and types and the number of the slots in the event templates;
[0011] screening the events and the event templates according to the weight of each event;
[0012] matching and filling the screened events and event templates, and generating news content; and Date Recue/Date Received 2022-06-22
[0013] reprocessing the news content to obtain the final news content.
[0014] Preferably, the obtaining an event set, slots, and a slot value to which each slot corresponds includes the following steps:
[0015] obtaining a preset number of sports news corpora;
[0016] processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds; and
[0017] disposing all the events in the same and single set to obtain the event set.
[0018] Preferably, after processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds, the method further comprises the following steps:
[0019] judging whether each event, each slot, and each slot value conform to a preset range;
[0020] if yes, retaining the event, the slot, and the slot value;
[0021] if not, deleting the event, the slot, or the slot value.
[0022] Preferably, the event includes a title, an abstract, and a text.
[0023] Preferably, the performing weight assignment on each event includes the following steps:
[0024] dividing the corpus according to all the events to obtain plural sections;
[0025] constructing, with respect to each event, mapping between the event and each section;
and
[0026] setting, with respect to each mapping, a weight of the event to which the mapping corresponds.
[0027] Preferably, the coding each event, and types and the number of the slots in the event templates includes the following steps:
[0028] obtaining the event templates and the events to be coded;
[0029] counting the event templates, and the types and the number of the slots in the events Date Recue/Date Received 2022-06-22 according to regular matching;
[0030] determining a total number m of all the slots and the maximum number of times n for which the slots appear in each event template;
[0031] determining that each slot should be assigned with n number of binary digits for representation according to the maximum number of times for which the slots appear, wherein n is a divisor of 64;
[0032] determining long types of a coding type as employed and the number x of codes according to the total number of the slots and the number of binary digits assigned to each slot, wherein x=[(m*n)/641+1;
[0033] traversing each slot in the event templates, and performing binary coding on the number of slots of the current slot;
[0034] determining that the current slot is coded at the yth long type according to an index address i of the current slot, wherein y=i/(64/n)+1;
[0035] moving binary representation of the number of slots leftwards for p times, wherein p=(i-(y-1)*(64/n)rn; and
[0036] joining all codes of the long type to obtain the final codes.
[0037] Preferably, the screening the events includes the following steps:
[0038] obtaining a corresponding weight of each event;
[0039] comparing each weight with a preset threshold; and
[0040] retaining any event whose weight is greater than the preset threshold, and eliminating the remaining events.
[0041] Preferably, the screening the event templates includes the following steps:
[0042] obtaining the screened events and the codes corresponding thereto, and the codes to which all the event templates correspond in the events;
[0043] selecting one or more event template(s) with the maximum number of slots to serve as candidate event template(s); and
[0044] randomly selecting one event template from the candidate event template(s) to serve as Date Recue/Date Received 2022-06-22 the event template to be filled.
[0045] In view of the deficiencies prevalent in the state of the art, the second objective of the present application is to provide a sports news writing device based on a natural language, to enhance the diversity of sentence patterns of articles and maximize information volume of articles. The device comprises:
[0046] an obtaining unit, for obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds;
[0047] an event template marking unit, for marking event templates in the corpus according to each event in the event set, the slots, and the slot values;
[0048] a weight assigning unit, for performing weight assignment on each event;
[0049] a coding unit, for coding each event, and types and the number of the slots in the event templates;
[0050] a screening unit, for screening the events and the event templates according to the weight of each event;
[0051] a news content generating unit, for matching and filling the screened events and event templates, and generating news content; and
[0052] a news content processing unit, for reprocessing the news content to obtain the final news content.
[0053] In view of the deficiencies prevalent in the state of the art, the third objective of the present application is to provide an electronic equipment to enhance the diversity of sentence patterns of articles and maximize information volume of articles. The electronic equipment comprises:
[0054] at least one processor; and
[0055] a memory, communicably connected with the at least one processor;
wherein
[0056] the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor to enable the processor to execute the aforementioned sports news writing method.
Date Recue/Date Received 2022-06-22
[0057] The sports news writing method, device and electronic equipment based on a natural language as provided by the present application make it possible to automatically generate news contents according to event templates extracted by prior analysis of great quantities of sports news and according to weight assignments of users themselves on events, whereby diversity of sentence patterns of articles is enhanced and information volume of articles is maximized on the one hand, highly effectively automatic writing of sports news articles is realized and the input of manpower cost is reduced on the other hand.
BRIEF DESCRIPTION OF THE DRAWINGS
[0058] To more clearly describe the embodiments of the present invention or technical solutions in the prior-art technology, drawings required to be used in the description of the embodiments or the prior-art technology are briefly introduced below.
Apparently, the drawings as introduced below are merely directed to some embodiments of the present invention, and it is further possible for persons ordinarily skilled in the art to acquire other drawings based on these drawings without spending any creative effort in the process.
[0059] Fig. 1 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 1 of the present invention;
[0060] Fig. 2 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 2 of the present invention;
[0061] Fig. 3 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 3 of the present invention;
[0062] Fig. 4 is a flowchart illustrating a sports news writing method based on a natural language Date Recue/Date Received 2022-06-22 provided by Embodiment 4 of the present invention;
[0063] Fig. 5 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 5 of the present invention;
[0064] Fig. 6 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 6 of the present invention;
[0065] Fig. 7 is a flowchart illustrating a sports news writing method based on a natural language provided by Embodiment 7 of the present invention;
[0066] Fig. 8 is a view schematically illustrating the structure of a sports news writing device based on a natural language provided by the present invention; and
[0067] Fig. 9 is a view schematically illustrating the structure of an electronic equipment provided by the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0068] The embodiments of the current disclosure are described in greater detail below with reference to the accompanying drawings.
[0069] The modes of execution of the current disclosure are described below through specific concrete examples, and it is possible for persons skilled in the art to easily learn of other advantages and effects of the current disclosure from the contents disclosed in this Description. Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the current disclosure. The current disclosure can be further implemented or applied through additional, different, specific modes of execution, and the various details in this Description can also be variously modified or changed based Date Recue/Date Received 2022-06-22 on different viewpoints and applications without departing from the spirits of the current disclosure. As should be noted, the following embodiments and features in the embodiments can be combined with one another provided that they are not contradictory to one another. Based on the embodiments in the current disclosure, all other embodiments obtainable by persons ordinarily skilled in the art without spending any creative effort should all be covered within the protection scope of the current disclosure.
[0070] As should be noted, various aspects of the embodiments are described below within the range of the attached Claims. As is obvious, the aspects described in this Description can be embodied in a wide range of forms, and any specific structure and/or function described in this Description are/is merely explanative in nature. As should be clear to persons skilled in the art on the basis of the current disclosure, one aspect described in this Description can be implemented independently of any other aspect, and can also be combined with two or more aspects in various modes. By way of example, any number of aspects enunciated in this Description can be employed to implement the device/equipment and/or to practice the method. In addition, other structures and/or functions than one or more aspects enunciated in this Description can be employed to implement the device/equipment and/or to practice the method.
[0071] As should be further noted, the illustrations provided in the following embodiments merely illustrate the basic conception of the current disclosure in an illustrative mode, shown in the illustrations are merely the components relevant to the current disclosure and these are not drawn in the number, shapes and sizes of the actual components during actual implementation. The forms, number and proportions of the various components are subjected to random changes during actual implementation, and the layouts and forms of the components might be even more complicated during actual implementation.
[0072] Moreover, specific details are provided in the following description to facilitate thorough understanding of the concrete examples. However, as understood by persons skilled in Date Recue/Date Received 2022-06-22 the art, the said aspects can still be practiced in the absence of these specific details.
[0073] An embodiment of the current disclosure provides a sports news writing method based on a natural language. The sports news writing method based on a natural language provided by this embodiment can be executed by a computer system, the computer system can be embodied as software, or a combination of software with hardware, and the computer system can be integrated in a server, a terminal equipment, etc.
[0074] Embodiment 1
[0075] As shown in Fig. 1, in an embodiment of the present application is provided a sports news writing method based on a natural language, and the method comprises the following steps.
[0076] Step S101 - obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds.
[0077] In this step, there are many methods of obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds, for instance, it is possible to manually input preset corpus to be processed, event set, slots, and slot value to which each slot corresponds to an equipment, it is also possible for the equipment to automatically obtain preset corpus to be processed, event set, slots, and slot value to which each slot corresponds, and it is further possible for a server to automatically crawl (by using a crawler for example) preset corpus to be processed, event set, slots, and slot value to which each slot corresponds, to which modes the present application makes no restriction.
[0078] Step S102 - marking event templates in the corpus according to each event in the event set, the slots, and the slot values.

Date Recue/Date Received 2022-06-22
[0079] In this step, in accordance with each event in the event set, the slots, and the slot values obtained in step S101, it is possible to mark event templates in the corpus to be processed as obtained in step S101.
[0080] Step S103 - performing weight assignment on each event.
[0081] In this step, there are many methods of performing weight assignment on each event, for instance, it is possible to manually input a weight to which each event corresponds to an equipment, it is also possible for the equipment to automatically obtain a weight to which each event corresponds, and it is further possible for a server to automatically crawl (by using a crawler for example) a weight to which each event corresponds, to which modes the present application makes no restriction.
[0082] Step S104 - coding each event, and types and the number of the slots in the event templates.
[0083] In this step, there are many methods of coding each event, and types and the number of the slots in the event templates, for instance, it is possible to manually input codes of each event, and types and the number of the slots in the event templates to an equipment, it is also possible for the equipment to automatically obtain codes of each event, and types and the number of the slots in the event templates, and it is further possible for a server to automatically crawl (by using a crawler for example) codes of each event, and types and the number of the slots in the event templates, to which modes the present application makes no restriction.
[0084] In this step, the coding type employed can be binary, octonary or decimal, to which the present application makes no restriction.
Date Recue/Date Received 2022-06-22
[0085] Step S105 - screening the events and the event templates according to the weight of each event.
[0086] In this step, the events and the event templates are screened according to the weight of each event obtained in step S103. There are many screening standards, for instance, it is possible to compare the weight of each event with a preset threshold, and to eliminate any event whose weight is smaller than the preset threshold, while the present application makes no restriction to the screening standards.
[0087] Step S106 - matching and filling the screened events and event templates, and generating news content.
[0088] In this step, the events and event templates screened in step S105 are matched and filled to generate news content. There are many matching and filling modes, to which no restriction is made in the present application.
[0089] Step S107 - reprocessing the news content to obtain the final news content.
[0090] In this step, the news content obtained in step S106 is reprocessed to obtain the final news content. There can be many reprocessing modes, such as manual reviewing and retouching, or rechecking by the system, etc., to which no restriction is made in the present application.
[0091] In Embodiment 1, the sports news writing method based on a natural language as provided by the present application makes it possible to automatically process the corpus to be processed, so as to enable automatic generation of news contents, to reduce manpower, and to enhance automatic processing capability for news corpora.
[0092] Embodiment 2 Date Recue/Date Received 2022-06-22
[0093] As shown in Fig. 2, in an embodiment of the present application, obtaining an event set, slots, and a slot value to which each slot corresponds in step S101 includes the following steps.
[0094] Step S201 - obtaining a preset number of sports news corpora.
[0095] In this step, there are many methods of obtaining a preset number of sports news corpora, for instance, it is possible to manually input a preset number of sports news corpora to an equipment, it is also possible for the equipment to automatically obtain a preset number of sports news corpora, and it is further possible for a server to automatically crawl (by using a crawler for example) a preset number of sports news corpora, to which modes the present application makes no restriction.
[0096] In this step, the preset number can be theoretically one piece, and can also be 100,000 pieces, however, out of overall considerations of data processing amount and accuracy of the result, the preset number can be selected as 1,000 pieces.
[0097] An example is taken for explanation by selecting two pieces of sports news corpora, specifically as follows:
[0098] sports news corpus 1: star player A shoots, goal succeeds.
[0099] sports news corpus 2: star player B shoots unsuccessfully, goal fails.
[0100] Step S202 - processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds.
[0101] In this step, the sports news corpora in step S201 are processed to obtain all the events, slots, and the slot value to which each slot corresponds.

Date Recue/Date Received 2022-06-22
[0102] Specifically, an example is taken for explanation with the above two pieces of sports news corpora, specifically as follows:
[0103] event is: goal;
[0104] slot is: __ shoots (the underlined location is the slot);
[0105] slot values are: star player A shoots, star player B shoots (the underlined locations are the slot values, in other words, the slot values are star player A and star player B).
[0106] Step S203 - disposing all the events in the same and single set to obtain the event set.
[0107] In this step, all the events obtained in step S202 are disposed in the same and single set to obtain an event set. The event set contains at least one event, and the number of event(s) contained in the event set may differ according to classifying standards of events. For instance, the event can be selected as "goal" in step S202, then the event set in step S203 contains only one event ¨ "goal"; it is also possible to select "goal succeeds" and "goal fails" as events in step S202, then the event set in step S203 contains two events -- "goal succeeds" and "goal fails".
[0108] In the embodiments of the present application, an event can further include a title, an abstract, and a text. For instance, a piece of sports news generally at least includes the three sections of "title", "abstract", and "text"; of course, it can further include "subtitle", "editorial note", and "commentary", etc., to which no restriction is made in the present application.
[0109] In Embodiment 2, the sports news writing method based on a natural language as provided by the present application makes it possible to analyze and process great quantities of sports news corpora to obtain therefrom event sets, slots, and slot values to which each slot corresponds, so as to provide research templates for subsequent automatic generation of news contents.

Date Recue/Date Received 2022-06-22
[0110] Embodiment 3
[0111] As shown in Fig. 3, in an embodiment of the present application, after processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds in step S202, the following steps are further included:
[0112] Step S301 -judging whether each event, each slot, and each slot value conform to a preset range;
[0113] Step S302 - if yes, retaining the event, the slot, and the slot value;
[0114] Step S303 - if not, deleting the event, the slot, or the slot value.
[0115] Through the above steps S301-S303, each event, each slot, and each slot value in step S202 can be judged as to whether they conform to a preset range, and to perform eliminating or retaining operation.
[0116] Specifically, the preset range can be either a specific numerical value or a definitive statement. For instance, plural events can be selected from the 1,000 pieces of sports news as "shoot", "goal", and "penalty kick", of which "penalty kick" is proportionally lower relative to "shoot" and "goal", and the event of "penalty kick" can be eliminated in the case processing result precision is required not much highly, whereby data processing amount is greatly reduced.
[0117] In Embodiment 3, the sports news writing method based on a natural language as provided by the present application makes it possible to screen events, slots and slot values to which each slot corresponds as obtained, whereby accuracy of news generation is enhanced on the one hand, and data processing amount is reduced on the other hand.
[0118] Embodiment 4
[0119] As shown in Fig. 4, in an embodiment of the present application, performing weight Date Recue/Date Received 2022-06-22 assignment on each event in step S103 includes the following steps:
[0120] Step S401 - dividing the corpus according to all the events to obtain plural sections;
[0121] Step S402 - constructing, with respect to each event, mapping between the event and each section; and
[0122] Step S403 - setting, with respect to each mapping, a weight of the event to which the mapping corresponds.
[0123] Specific explanation is made below with Table 1 and Table 2:
[0124] Table 1
[0125]
Event Section First Paragraph Second Paragraph Title 0.6 0.3 Abstract 0.4 0.7
[0126] With respect to Table 1, the corpus is divided into a first paragraph and a second paragraph, of which the first paragraph is considered by default to be the title section, and the second paragraph is considered by default to be the abstract section (in some news, the abstract may even be larger than one paragraph, but this case is not considered for the sake of brevity).
[0127] When weights are assigned to the title event and the abstract event, the weights of the title event with respect to the first paragraph and the second paragraph are respectively 0.6 and 0.3, and the weights of the abstract event with respect to the first paragraph and the second paragraph are respectively 0.4 and 0.7, that is to say, when the system considers the importance of the title event and the abstract event in the first paragraph and the second paragraph of the corpus, it is apparent that the title event is most probably in the first paragraph, and that the abstract event is most probably in the second paragraph.
Of course, in some other embodiments, when ranking is made according to a descending order of weights, the abstract event is most probably in the first paragraph, while the title Date Recue/Date Received 2022-06-22 event is most probably in the second paragraph.
[0128] Table 2
[0129]
Event Section First Paragraph Second Paragraph Goal 0.6 0.7 Penalty Kick 0.4 0.3
[0130] With respect to Table 2, the corpus is divided into a first paragraph and a second paragraph, of which the first paragraph is considered by default to be the title section, and the second paragraph is considered by default to be the abstract section (in some news, the abstract may even be larger than one paragraph, but this case is not considered for the sake of brevity).
[0131] When weights are assigned to the goal event and the penalty kick event, the weights of the goal event with respect to the first paragraph and the second paragraph are respectively 0.6 and 0.7, and the weights of the penalty kick event with respect to the first paragraph and the second paragraph are respectively 0.4 and 0.3, that is to say, when the system considers the importance of the goal event and the penalty kick event in the first paragraph and the second paragraph of the corpus, it is apparent that the goal event is most probably in the first paragraph, and that the penalty kick event is most probably in the second paragraph. Of course, in some other embodiments, when ranking is made according to a descending order of weights, the penalty kick event is most probably in the first paragraph, while the goal event is most probably in the second paragraph.
[0132] In Embodiment 4, the sports news writing method based on a natural language as provided by the present application makes it possible to assign weight to each event, and it is possible to set event weights by self-definition according to requirements on the one hand, and to also provide basis for subsequent coding operation on the other hand.

Date Recue/Date Received 2022-06-22
[0133] Embodiment 5
[0134] As shown in Fig. 5, in an embodiment of the present application, coding each event, and types and the number of the slots in the event templates in step S104 includes the following steps.
[0135] Step S501 - obtaining event templates and events to be coded.
[0136] In this step, there are many methods of obtaining event templates and events to be coded, for instance, it is possible to manually input event templates and events to be coded to an equipment, it is also possible for the equipment to automatically obtain event templates and events to be coded, and it is further possible for a server to automatically crawl (by using a crawler for example) event templates and events to be coded, to which modes the present application makes no restriction.
[0137] In the embodiments of the present application, the coding as employed is binary coding.
[0138] Step S502 - counting the event templates, and the types and the number of the slots in the events according to regular matching.
[0139] In this step, a regular expression is employed to sequentially traverse the event templates and events to be coded, so that it is possible to count the types and the number of each type of the event templates, and the types and the number of each type of the slots in the events.
[0140] Step S503 - determining a total number m of all the slots and the maximum number of times n for which the slots appear in each event template.

Date Recue/Date Received 2022-06-22
[0141] In this step, through step S502 it is possible to determine a total number m of all the slots and the maximum number of times n for which the slots appear in each event template.
[0142] In the embodiments of the present application, since binary coding is employed, so m=22, n=4. The maximum 4 times for which slots appear in the event template are based on to determine that 4 bits of binary digits should be assigned to each event slot to represent the number of times for which the event slot appears, and in the 4 bits of binary digits, a 1 on each binary digit represents that the event slot appears for one time.
[0143] Step S504 - determining that each slot should be assigned with n number of binary digits for representation according to the maximum number of times for which the slots appear, wherein n is a divisor of 64.
[0144] Step S505 - determining long types of a coding type as employed and the number x of codes according to the total number of the slots and the number of binary digits assigned to each slot, wherein x=[(m*n)/641+1.
[0145] In steps S504-S505, according to the total number 22 of the slots and the 4 bits of binary digits assigned to each slot, it is derived that 22*4=88 bits of binary digits are required to represent the number of all the slots.
[0146] In the embodiments of the present application, out of overall considerations of reality and the data processing amount, long type data are used for coding. One piece of long type data consists of 64 bits, so n should be a divisor of 64. At this time, 2 pieces of long type data are required to code the event templates and the slots in the corpus, and the 2 pieces of long type data are both initialized as 0.
[0147] Step S506 - traversing each slot in the event templates, and performing binary coding on the number of slots of the current slot.

Date Recue/Date Received 2022-06-22
[0148] In this step, the long type is employed to sequentially traverse each slot in the event templates, and binary coding is performed on each slot number. For instance, a certain slot appears for two times in an event template, then it is binarily represented by the long type as 0011.
[0149] Step S507 - determining that the current slot is coded at the yth long type according to an index address i of the current slot, wherein y=i/(64/n)+1.
[0150] In this step, it is determined that the current slot is coded at the yth long type according to an index address i of the current slot, and the y value can be calculated by the following formula:
[0151] y=i/(64/4)+1,
[0152] where i is an index address of the current slot. For instance, the index address i of the current slot is 15, then y=1, expressing that the current slot is coded on the first long data;
if the current slot i is 16, then y=2, the current slot should be coded on the second long data.
[0153] Step S508 - moving binary representation of the number of slots leftwards for p times, wherein p=0-(y-1)*(64/n))*n.
[0154] In this step, binary representation of the number of slots is moved leftwards for p times, and p therein is calculated as follows:
[0155] p=0-(y-1)*(64/4))*4
[0156] where i is a slot index address, and y represents that the current slot is coded on the yth long data. For instance, the index address i of a certain slot is 10, it is then calculated to derive that y=1, that is to say, the slot is coded on the first long data, and the slot appears for two times in the event template, then the number of times of the slot is binarily represented as 0011, during coding, 0011 is moved leftwards by 10*4 bits, to derive the Date Recue/Date Received 2022-06-22 following codes: 0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000.
[0157] Step S509 -joining all codes of the long type to obtain the final codes.
[0158] In this step, all codes of the long type are joined, so as to obtain the final codes. All slot codes on the same and single long data should be summated.
[0159] In Embodiment 5, the sports news writing method based on a natural language as provided by the present application makes it possible to binarily code each event and the types and numbers of slots in the event templates, whereby the subsequent operation of screening the events and event templates is facilitated, and the workload of the screening operation is reduced.
[0160] Embodiment 6
[0161] As shown in Fig. 6, in an embodiment of the present application, screening the events in step S105 includes the following steps:
[0162] Step S601 - obtaining a corresponding weight of each event;
[0163] Step S602 - comparing each weight with a preset threshold; and
[0164] Step S603 - retaining any event whose weight is greater than the preset threshold, and eliminating the remaining events.
[0165] The many events should be screened in steps S601-S603, and the screening standard is the comparison of the weights corresponding to the events with a preset threshold.
[0166] A specific example is taken below for explanation.
[0167] In the embodiments of the present application, the news includes three sections, namely a title, an abstract, and a text, and the events include two types, namely "goal" and Date Recue/Date Received 2022-06-22 "penalty kick". The two events are assigned with weights according to business requirements, and the "goal" is assigned with a greater weight than that assigned to the "penalty kick" (the weights here are valuated between 0 and 1). In sports news writing, events are screened according to event weights, an event whose weight is greater than the preset threshold is preferentially selected to be written into the news title section, so the "goal" is written in the title section, whereas the "penalty kick" is written in either the abstract section or the text section.
[0168] Moreover, if an event further includes two sections as an abstract and a text, and the weight of the text event is higher than the weight of the abstract event, then the "penalty kick" will be preferentially written in the text section of the news.
[0169] In Embodiment 6, the sports news writing method based on a natural language as provided by the present application makes it possible to screen events, whereby proper events can be selected from many events according to requirement to provide basis for subsequent generation of news.
[0170] Embodiment 7
[0171] As shown in Fig. 7, in an embodiment of the present application, screening the event templates in step S105 includes the following steps:
[0172] Step S701 - obtaining the screened events and the codes corresponding thereto, and the codes to which all the event templates correspond in the events;
[0173] Step S702 - selecting one or more event template(s) with the maximum number of slots to serve as candidate event template(s); and
[0174] Step S703 - randomly selecting one event template from the candidate event template(s) to serve as the event template to be filled.
[0175] In steps S701-S703, the event template to be filled can be selected.

Date Recue/Date Received 2022-06-22
[0176] A specific example is taken below for explanation.
[0177] Table 3
[0178]
Project Number 1 2 Event 0010 1010 Event Template 0011 1011
[0179] As can be seen from Table 3, event 1 is coded 0010, event 2 is coded 1010, event template 1 is coded 0011, and event template 2 is coded 1011.
[0180] If event template 1 has one slot and event template 2 has two slots, event template 2 with the maximum number of slots is selected as the event template to be filled, so event template 2 serves as the unique one event template to be filled.
[0181] If event template 1 has one slot and event template 2 also has one slot, since the event template with the maximum number of slots is selected, so event template 1 and event template 2 each having one slot can each serve as the event template to be filled, then one event template is randomly selected from event template 1 and event template 2 to serve as the event template to be filled, in other words, the event template to be filled can be either event template 1 or event template 2.
[0182] In Embodiment 7, the sports news writing method based on a natural language as provided by the present application makes it possible to screen event templates, whereby proper event templates can be selected from many event templates according to requirement to provide basis for subsequent generation of news.
[0183] As shown in Fig. 8, in an embodiment of the present application, there is further provided Date Recue/Date Received 2022-06-22 a sports news writing device based on a natural language, to enhance the diversity of sentence patterns of articles and maximize information volume of articles, the device comprises:
[0184] an obtaining unit 801, for obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds;
[0185] an event template marking unit 802, for marking event templates in the corpus according to each event in the event set, the slots, and the slot values;
[0186] a weight assigning unit 803, for performing weight assignment on each event;
[0187] a coding unit 804, for coding each event, and types and the number of the slots in the event templates;
[0188] a screening unit 805, for screening the events and the event templates according to the weight of each event;
[0189] a news content generating unit 806, for matching and filling the screened events and event templates, and generating news content; and
[0190] a news content processing unit 807, for reprocessing the news content to obtain the final news content.
[0191] The device shown in Fig. 8 can correspondingly execute the contents in the foregoing method embodiment. Portions that are not described in detail in this embodiment can be inferred from the contents recited in the foregoing method embodiment, while no repetition is made in this context.
[0192] Refer below to Fig. 9 that is a view schematically illustrating a structure suitable for realizing the electronic equipment 90 according to an embodiment of the current disclosure. The electronic equipment in this embodiment of the current disclosure can include, but is not limited to include, such a mobile terminal as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD
(portable Android device), a PIMP (portable multimedia player), an onboard terminal (such as an onboard navigation terminal), and such a fixed terminal as a digital TV, a Date Recue/Date Received 2022-06-22 desktop computer, etc. The electronic equipment shown in Fig. 3 is merely an example, and shall not be anyway restrictive of the function and range of use of the embodiment of the current disclosure.
[0193] As shown in Fig. 9, electronic equipment 90 can include a processing device (such as a central processor, a graphics processor, etc.) 901 capable of executing various proper actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 to a random-access memory (RAM) 903.
In RAM 903 are further stored various programs and data required for the operation of electronic equipment 90. Processing device 901, ROM 902 and RAM 903 are connected to one another via bus 904. Input/output (I/O) interface 905 is also connected to bus 904.
[0194] Usually, the following systems can be connected to I/O interface 905:
including such an input means 906 as a touch screen, a touch panel, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; such an output means 907 as a liquid crystal display (LCD), a loudspeaker, a vibrator, etc.; such a storge device 908 as a magnetic disk, a hard disk, etc.; and a communication device 909.
Communication device 909 allows wireless or wired communication between electronic equipment 90 and other equipments to exchange data. Although an electronic equipment 90 possessing various devices is shown in the Figure, as should be understood, it is not required to implement or possess all the devices as shown. It is alternatively possible to implement or possess more or less devices.
[0195] Particularly, according to the embodiments of the current disclosure, the processes described above with reference to flowcharts can be realized as computer software programs. For instance, embodiments of the current disclosure include a computer program product that includes a computer program borne on a computer-readable medium, and the computer program contains program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program can be Date Recue/Date Received 2022-06-22 downloaded from the network and installed through communication device 909, or installed from storage device 908, or installed from ROM 902. When the computer program is executed by processing device 901, it executes the aforementioned functions defined in the methods of the embodiments of the current disclosure.
[0196] The sports news writing method based on a natural language, and corresponding device and electronic equipment as provided by the present application make it possible to automatically generate news contents according to event templates extracted by prior analysis of great quantities of sports news and according to weight assignments of users themselves on events, whereby diversity of sentence patterns of articles is enhanced and information volume of articles is maximized on the one hand, highly effectively automatic writing of sports news articles is realized and the input of manpower cost is reduced on the other hand.
[0197] As should be noted, the computer-readable medium recited above in the current disclosure can be a computer-readable signal medium or a computer-readable storage medium or a random combination of the two. The computer-readable storage medium can for example be, but is not limited to be, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor device, system or component, or any combination of the above.
A more specific example of the computer-readable storage medium can include, but is not limited to include, an electrically connectible, portable computer magnetic disk, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact magnetic disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device having one or more conducting wire(s), or any suitable combination of the above. In the current disclosure, the computer-readable storage medium can be any tangible medium containing or storing a program usable by or in combination with an instruction executing device, system, or component.
Moreover, in the current disclosure, the computer-readable signal medium can include a data signal Date Recue/Date Received 2022-06-22 transmitted in a baseband or as part of a carrier wave, in which data signal are borne computer-readable program codes. The data signal thusly propagated can be embodied in plural forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination thereof. The computer-readable signal medium can further be any other computer-readable medium than the computer-readable storage medium, and the computer-readable signal medium can transmit, propagate, or convey programs for use by or in combination with an instruction executing device, system, or component.
The program codes contained in the computer-readable medium can be transmitted via any suitable medium, including, but not limited to, an electric wire, an optical fiber, radio frequency (RF) etc., or any suitable combination of the above.
[0198] The computer-readable medium can be either contained in the electronic equipment, or independent of, not installed in the electronic equipment.
[0199] The computer-readable medium carries therewith one or more program(s), when the one or more program(s) is/are executed by the electronic equipment, the electronic equipment is enabled: to acquire at least two internet protocol addresses; to transmit a node evaluation request including the at least two internet protocol addresses to a node evaluation equipment, which selects an internet protocol address from the at least two internet protocol addresses and returns the same; and to receive the internet protocol address returned by the node evaluation equipment, wherein the internet protocol addresses as acquired indicate boundary nodes in a content distribution network.
[0200] Alternatively, the computer-readable medium carries therewith one or more program(s), when the one or more program(s) is/are executed by the electronic equipment, the electronic equipment is enabled: to receive a node evaluation request including at least two internet protocol addresses; to select an internet protocol address from the at least two internet protocol address; and to return the selected internet protocol address, wherein the internet protocol addresses as received indicate boundary nodes in a content Date Recue/Date Received 2022-06-22 distribution network.
[0201] One or more programming language(s) or a combination thereof can be employed to write the computer program codes for executing the operations of the current disclosure, the programming language(s) include(s) such an object-oriented programming language as Java, Smalltalk, C++, and further include(s) such a conventional procedural programming language as "C" language or a similar programming language. The program codes can be entirely executed on a user computer, partly executed on a user computer, executed as an independent software package, partly executed on a user computer and partly executed on a remote computer, or entirely executed on a remote computer or a server.
In the case a remote computer is involved, the remote computer can be connected to the user computer via a randomly typed network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (for example, internet connection can be supplied by an internet service provider).
[0202] The flowcharts and block diagrams in the accompanying drawings illustrate systemic frameworks, functions, and operations potentially realizable by the system, method, and computer program product according to the various embodiments of the current disclosure. With respect thereto, each block in the flowcharts or the block diagrams can represent a module, a program segment, or a part of the codes, and the module, the program segment, or the part of the codes includes one or more executable instruction(s) for realizing designated logic function(s). As is also noticeable, in some implementations serving as substitutions, functions marked in the blocks can also occur in sequences different from those marked in the drawings. For instance, two blocks expressed by consecution can in practice be essentially parallelly executed, and they can sometimes also be executed according to inverse sequences, as these are decided by the involved functions. As should also be noted, each block in the block diagrams and/or the flowcharts and the combination of blocks in the block diagrams and/or the flowcharts can be realized by dedicated, hardware-based systems that execute designated functions or operations, Date Recue/Date Received 2022-06-22 and can be alternatively realized by a combination of dedicated hardware with computer instructions.
[0203] Units involved in the description of the embodiments of the current disclosure can be realized in the form of software, and can also be realized in the form of hardware. The name of a unit does not constitute any restriction to the unit itself under certain circumstances. For instance, a first obtaining unit can as well be referred to as a "unit for obtaining at least two intemet protocol addresses".
[0204] As should be understood, the various portions of the current disclosure can be realized by hardware, software, firmware, or a combination thereof.
[0205] What the above describes is merely directed to specific modes of execution of the current disclosure, but the protection scope of the current disclosure is not restricted thereby. Any change or replacement easily conceivable to persons skilled in the art within the technical range disclosed in the current disclosure shall all be covered by the protection scope of the current disclosure. Accordingly, the protection scope of the current disclosure shall be based on the protection scope of the attached Claims.

Date Recue/Date Received 2022-06-22

Claims (10)

What is claimed is:
1. A sports news writing method based on a natural language, characterized in that the method comprises the following steps:
obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds;
marking event templates in the corpus according to each event in the event set, the slots, and the slot values;
performing weight assignment on each event;
coding each event, and types and the number of the slots in the event templates;
screening the events and the event templates according to the weight of each event;
matching and filling the screened events and event templates, and generating news content; and reprocessing the news content to obtain the final news content.
2. The sports news writing method according to Claim 1, characterized in that the obtaining an event set, slots, and a slot value to which each slot corresponds includes the following steps:
obtaining a preset number of sports news corpora;
processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds; and disposing all the events in the same and single set to obtain the event set.
3. The sports news writing method according to Claim 2, characterized in further comprising the following steps, after processing the sports news corpora to obtain all the events, slots, and the slot value to which each slot corresponds:
judging whether each event, each slot, and each slot value conform to a preset range;
if yes, retaining the event, the slot, and the slot value;
if not, deleting the event, the slot, or the slot value.

Date Recue/Date Received 2022-06-22
4. The sports news writing method according to Claim 1, characterized in that the event includes a title, an abstract, and a text.
5. The sports news writing method according to Claim 1, characterized in that the performing weight assignment on each event includes the following steps:
dividing the corpus according to all the events to obtain plural sections;
constructing, with respect to each event, mapping between the event and each section; and setting, with respect to each mapping, a weight of the event to which the mapping corresponds.
6. The sports news writing method according to Claim 1, characterized in that the coding each event, and types and the number of the slots in the event templates includes the following steps:
obtaining the event templates and the events to be coded;
counting the event templates, and the types and the number of the slots in the events according to regular matching;
determining a total number m of all the slots and the maximum number of times n for which the slots appear in each event template;
determining that each slot should be assigned with n number of binary digits for representation according to the maximum number of times for which the slots appear, wherein n is a divisor of 64;
determining long types of a coding type as employed and the number x of codes according to the total number of the slots and the number of binary digits assigned to each slot, wherein x=[(m*n)/641+1;
traversing each slot in the event templates, and performing binary coding on the number of slots of the current slot;
determining that the current slot is coded at the yth long type according to an index address i of the current slot, wherein y=i/(64/n)+1;
moving binary representation of the number of slots leftwards for p times, wherein p=0-(y-1)*(64/n)rn; and Date Recue/Date Received 2022-06-22 joining all codes of the long type to obtain the final codes.
7. The sports news writing method according to Claim 1, characterized in that the screening the events includes the following steps:
obtaining a corresponding weight of each event;
comparing each weight with a preset threshold; and retaining any event whose weight is greater than the preset threshold, and eliminating the remaining events.
8. The sports news writing method according to Claim 1, characterized in that the screening the event templates includes the following steps:
obtaining the screened events and the codes corresponding thereto, and the codes to which all the event templates correspond in the events;
selecting one or more event template(s) with the maximum number of slots to serve as candidate event template(s); and randomly selecting one event template from the candidate event template(s) to serve as the event template to be filled.
9. A sports news writing device based on a natural language, characterized in that the device comprises:
an obtaining unit, for obtaining a corpus to be processed, an event set, slots, and a slot value to which each slot corresponds;
an event template marking unit, for marking event templates in the corpus according to each event in the event set, the slots, and the slot values;
a weight assigning unit, for performing weight assignment on each event;
a coding unit, for coding each event, and types and the number of the slots in the event templates;
a screening unit, for screening the events and the event templates according to the weight of each event;
a news content generating unit, for matching and filling the screened events and event templates, Date Recue/Date Received 2022-06-22 and generating news content; and a news content processing unit, for reprocessing the news content to obtain the final news content.
10. An electronic equipment, characterized in that the electronic equipment comprises:
at least one processor; and a memory, communicably connected with the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor to enable the processor to execute the sports news writing method according to any one of Claims 1 to 8.

Date Recue/Date Received 2022-06-22
CA3165616A 2019-12-23 2020-06-19 Sports news writing method based on natural language, device and electronic equipment Pending CA3165616A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201911336569.4A CN111191434B (en) 2019-12-23 2019-12-23 Sports news writing method and device based on natural language and electronic equipment
CN201911336569.4 2019-12-23
PCT/CN2020/097005 WO2021128768A1 (en) 2019-12-23 2020-06-19 Sports news writing method and apparatus based on natural language, and electronic device

Publications (1)

Publication Number Publication Date
CA3165616A1 true CA3165616A1 (en) 2021-07-01

Family

ID=70711044

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3165616A Pending CA3165616A1 (en) 2019-12-23 2020-06-19 Sports news writing method based on natural language, device and electronic equipment

Country Status (3)

Country Link
CN (1) CN111191434B (en)
CA (1) CA3165616A1 (en)
WO (1) WO2021128768A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191434B (en) * 2019-12-23 2024-04-26 苏宁云计算有限公司 Sports news writing method and device based on natural language and electronic equipment
CN113553812A (en) * 2021-06-22 2021-10-26 北京来也网络科技有限公司 News processing method and device combining RPA and AI
CN117390144B (en) * 2023-12-13 2024-03-08 北京搜狐新媒体信息技术有限公司 News timeliness determining method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540430B2 (en) * 2011-12-28 2020-01-21 Cbs Interactive Inc. Techniques for providing a natural language narrative
CN105975466A (en) * 2015-11-04 2016-09-28 新华通讯社 Method and device for machine manuscript writing aiming at short newsflashes
CN106407168A (en) * 2016-09-06 2017-02-15 首都师范大学 Automatic generation method for practical writing
CN106776523B (en) * 2017-01-22 2020-04-07 百度在线网络技术(北京)有限公司 Artificial intelligence-based news quick report generation method and device
CN109902305A (en) * 2019-03-04 2019-06-18 上海宝尊电子商务有限公司 Template generation, search and text generation apparatus and method for based on name Entity recognition
CN110209838A (en) * 2019-06-10 2019-09-06 广东工业大学 A kind of text template acquisition methods and relevant apparatus
CN111191434B (en) * 2019-12-23 2024-04-26 苏宁云计算有限公司 Sports news writing method and device based on natural language and electronic equipment

Also Published As

Publication number Publication date
WO2021128768A1 (en) 2021-07-01
CN111191434A (en) 2020-05-22
CN111191434B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CA3165616A1 (en) Sports news writing method based on natural language, device and electronic equipment
CN105528372B (en) A kind of address search method and equipment
CN109558525B (en) Test data set generation method, device, equipment and storage medium
CN106101358A (en) A kind of method of contact person information updating and smart machine
CN103390003A (en) Method and device for combining user data information among servers
CN108351766B (en) Creating and modifying applications from mobile devices
CN112015468B (en) Interface document processing method and device, electronic equipment and storage medium
CN109408682A (en) A kind of method of regular expression matching, system and equipment
US20180357404A1 (en) Information processing method and apparatus, and electronic device
US20170150214A1 (en) Method and apparatus for data processing
CN105094603A (en) Method and device for related inputting
CN110652728A (en) Game resource management method and device, electronic equipment and storage medium
CN110442803A (en) Data processing method, device, medium and the calculating equipment executed by calculating equipment
CN111694992A (en) Data processing method and device
CN110909390B (en) Task auditing method and device, electronic equipment and storage medium
CN111460801A (en) Title generation method and device and electronic equipment
JP5307294B2 (en) Operation support computer program, operation support computer system
CN104965737A (en) Updated data acquisition method and device
CN115048908A (en) Method and device for generating text directory
CN107050851B (en) Sound enhancement method and system for game content effect
CN112732100A (en) Information processing method and device and electronic equipment
CN112100468A (en) Search space generation method and device, electronic equipment and storage medium
CN111368208A (en) Method and device for recommending target object to user and electronic equipment
CN110704744A (en) Method and device for recommending target object to user and electronic equipment
CN113157704B (en) Hierarchical relationship analysis method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622

EEER Examination request

Effective date: 20220622