CN104462145B - A kind of sentence generation method and device - Google Patents

A kind of sentence generation method and device Download PDF

Info

Publication number
CN104462145B
CN104462145B CN201310440040.3A CN201310440040A CN104462145B CN 104462145 B CN104462145 B CN 104462145B CN 201310440040 A CN201310440040 A CN 201310440040A CN 104462145 B CN104462145 B CN 104462145B
Authority
CN
China
Prior art keywords
data message
sentence
formatting
words
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310440040.3A
Other languages
Chinese (zh)
Other versions
CN104462145A (en
Inventor
董振华
欧阳靖民
张弓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201310440040.3A priority Critical patent/CN104462145B/en
Publication of CN104462145A publication Critical patent/CN104462145A/en
Application granted granted Critical
Publication of CN104462145B publication Critical patent/CN104462145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of sentence generation method and device.This method includes:At least one data message of collection terminal, wherein, the data message includes at least one of the operation information of the terminal, information that the operation information of the terminal and the terminal receive from external interface;Determine sentence element of each data message of at least one data message in sentence to be formed;According to sentence element of at least one data message of determination in sentence to be formed, at least one data message is formed into sentence.Also disclose corresponding device.Using the technical scheme of a kind of sentence generation method and device of the present invention, sentence can be automatically generated according to the various data messages of terminal, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to be recorded automatically by these activities of terminal-pair or events.

Description

A kind of sentence generation method and device
Technical field
The present invention relates to language technology field, and in particular to a kind of sentence generation method and device.
Background technology
Automatic diary on intelligent terminal can save the event cost that people record event, and from multiple dimensions and visual angle The context that record event occurs, can objectively recurring events, meanwhile, the popularization of intelligent terminal carries for the generation of automatic diary The available information sources and data basis of a variety of dimensions are supplied.However, a kind of generation method of automatic diary of the prior art, its Source data is mainly text data, such as blog information, social network information, short message, associated person information, from above text Extraction feature in information, diary is generated, but when source data lacks text and described, will be unable to generate diary;It is of the prior art The generation method of another automatic diary is analysis mobile phone service condition and sensing data, (is such as opened with reference to mobile phone operation event Shutdown, receiving and dispatching mail etc.) with the corresponding relation of User Activity, identify the event of User Activity or generation, in chronological order finally One day event occurred of tissue, diary is generated, the diary of this method generation, its content is very simple, and form is the " time:Event " Sequence, information content is deficient, and does not use complete sentence to describe User Activity or event, readable poor.
In summary, how according to the various data messages of terminal sentence is automatically generated, is fully described by with sentence at end The activity occurred on end or event turn into the problem of industry is in the urgent need to address.
The content of the invention
In view of this, the invention provides a kind of sentence generation method and device, to be believed according to the various data of terminal Breath automatically generates sentence, and the activity or the event that occur in terminal is fully described by with sentence.
First aspect, there is provided a kind of sentence generation method, including:
At least one data message of collection terminal, wherein, the data message includes the operation information of the terminal, institute State at least one of information that the operation information of terminal and the terminal receive from external interface;
Determine sentence element of each data message of at least one data message in sentence to be formed;
, will be described at least one according to sentence element of at least one data message of determination in sentence to be formed Data message forms sentence.
In the first possible implementation, at least one data message of the collection terminal, including:
At least one data message of acquisition terminal;
Detect the source of at least one data message;
According to the source of at least one data message, according to form corresponding with the source, at least one by described in Individual data message is formatted, and obtains the data message after at least one formatting;
Sentence element of each data message for determining at least one data message in sentence to be formed, bag Include:
For the data message after each formatting, searched from database and the data message after the formatting At least one words of description of matching;
According at least one words of description matched with the data message after the formatting, it is determined that each formatting Sentence element of the data message afterwards in sentence to be formed.
With reference to the first possible implementation of first aspect, in second of possible implementation, the basis At least one words of description matched with the data message after the formatting, it is determined that each data after each formatting After sentence element of the information in sentence to be formed, and at least one data message according to determination is treating group Into the sentence element in sentence, before at least one data message composition sentence, methods described also includes:
For the data message after each formatting, according to being matched with the data message after the formatting at least The probability that one words of description uses in the database, described in matched with the data message after the formatting at least A words of description is selected in one words of description.
With reference to second of possible implementation of first aspect, in the third possible implementation, the basis Sentence element of at least one data message determined in sentence to be formed, at least one data message is formed Sentence, including:
According to the type of sentence element of at least one data message of determination in sentence to be formed, from syntax knot The sentence structure of the type of sentence element of the selection comprising at least one data message in structure storehouse;
According to position of the sentence element of at least one data message in the sentence structure, by selection and institute State the words of description composition sentence of the matching of the data message after at least one formatting.
With reference to second of possible implementation of first aspect, in the 4th kind of possible implementation, the basis Sentence element of at least one data message determined in sentence to be formed, at least one data message is formed Sentence, including:
According to sentence element of at least one data message of determination in sentence to be formed, by selection with it is described The words of description of data message matching after at least one formatting is matched with the sentence in statement model storehouse;
Obtain the sentence after the matching.
Second aspect, there is provided a kind of sentence generating means, including:
Collector unit, at least one data message of collection terminal, wherein, the data message includes the terminal Operation information, the operation information and at least one of the information that is received from external interface of the terminal of the terminal;
Determining unit, for determining sentence of each data message of at least one data message in sentence to be formed Subconstiuent;
Component units, for the sentence element according at least one data message of determination in sentence to be formed, By at least one data message composition sentence.
In the first possible implementation, the collector unit includes:
Gather subelement, at least one data message for acquisition terminal;
Detection sub-unit, for detecting the source of at least one data message;
Subelement is formatted, for the source according at least one data message, according to corresponding with the source Form, at least one data message is formatted, obtains the data message after at least one formatting;
The determining unit includes:
Subelement is searched, for for the data message after each formatting, being searched and the lattice from database At least one words of description of data message matching after formula;
Determination subelement, at least one words of description matched for basis with the data message after the formatting, really Sentence element of the data message in sentence to be formed after fixed each formatting.
With reference to the first possible implementation of second aspect, in second of possible implementation, described device Also include:
Selecting unit, for for the data message after each formatting, according to the data message after the formatting The probability that uses in the database of at least one words of description of matching, from the data message after the formatting A words of description is selected at least one words of description of matching.
With reference to second of possible implementation of second aspect, in the third possible implementation, the composition Unit includes:
Subelement is selected, for the type of the sentence element according at least one data message, from syntax structural library The sentence structure of the type of middle sentence element of the selection comprising at least one data message;
Subelement is formed, for position of the sentence element according at least one data message in the sentence structure Put, the words of description matched with the data message after at least one formatting of selection is formed into sentence.
With reference to second of possible implementation of second aspect, in the 4th kind of possible implementation, the composition Unit includes:
Coupling subelement, for according to sentence of at least one data message of determination in sentence to be formed into Point, by the sentence in the words of description matched with the data message after at least one formatting of selection and statement model storehouse Matched;
Subelement is obtained, for obtaining the sentence after the matching.
Using the technical scheme of a kind of sentence generation method and device of the present invention, can be believed according to the various data of terminal Breath automatically generates sentence, and the activity or the event that occur in terminal is fully described by with sentence, facilitate user by terminal-pair this A little activities or event are recorded automatically.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of one embodiment of sentence generation method of the present invention;
Fig. 2 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1 Cheng Tu;
Fig. 3 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1 Cheng Tu;
Fig. 4 is a kind of structural representation of one embodiment of sentence generating means of the present invention;
Fig. 5 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4 Structure schematic diagram;
Fig. 6 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4 Structure schematic diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
Fig. 1 is a kind of flow chart of one embodiment of sentence generation method of the present invention.As shown in figure 1, this method includes Following steps:
Step S101, at least one data message of collection terminal, wherein, the data message includes the fortune of the terminal At least one of information that row information, the operation information of the terminal and the terminal receive from external interface.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro- Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces. The present invention can unify these data messages of collection terminal and be arranged.
Step S102, determine sentence of each data message of at least one data message in sentence to be formed into Point.
For each data message of collection, it is defined as corresponding sentence element, the type of sentence element includes master Language, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as the temporal information for collection terminal, the time can be believed Breath is defined as time adverbial, and for the information collected from GPS, the information can be identified as to point adverbial etc..
Step S103, according to sentence element of at least one data message of determination in sentence to be formed, by institute State at least one data message composition sentence.
After the sentence element of each data message for identifying collection, it is possible to according to sentence corresponding to these data messages Composition, matched according to certain sentence structure or according to language model, obtain the sentence of these data messages composition, so as to Complete description has been carried out to the content that these data messages include by one or more sentences.The sentence of accumulation forms Automatic diary text.
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Fig. 2 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1 Cheng Tu.As shown in Fig. 2 this method comprises the following steps:
Step S201, at least one data message of acquisition terminal.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro- Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces.
Step S202, detect the source of at least one data message.
The source of these data messages collected is detected, these sources are:If the information is GPS information, this comes Source is the GPS in terminal;If sensor information, then the source is some sensor in terminal;If call-information, Application program (Application, APP) information, then can be according to software program identifier source such as microblogging.
Step S203, according to the source of at least one data message, according to form corresponding with the source, by institute State at least one data message to be formatted, obtain the data message after at least one formatting.
To gathering the data message from separate sources, need to carry out arranging these data messages in different formats, so as to In subsequent use.
Such as:
1st, micro-blog information:For the microblogging of a certain moment user issue, every microblogging is represented by after formatting:<Time, Content of microblog, ID>Triple.
2nd, GPS information:For the positional information at a certain moment, every GPS information is represented by after formatting:
<Time, longitude, dimension, height>Four-tuple.
3rd, acceleration information:For the acceleration information at a certain moment, every acceleration information is represented by after formatting:
<Time, x-axis acceleration, y-axis acceleration, z-axis acceleration>Four-tuple.
4th, call-information:For information service conditions such as call, short messages, specifically include:
Call:Converse the time started, the end of conversation time, the duration of call, caller, be called, the phone miss times.
Short message:Short message receives the time, receives short message length, the short message sending time, sends short message length.
Every call-information is represented by after formatting:
<Time, this mobile phone state, other side's mobile phone state, this mobile phone set state, other side's mobile phone ID>Five-tuple
Such as the machine is connected to incoming call and is represented by:
<Time, incoming call is connected to, called, mobile phone jingle bell, other side's mobile phone ID>
Can have to the form that the data message collected is formatted a variety of, above example is only listed based on tuple Representation, the present invention including but not limited to above example.
Step S204, for the data message after each formatting, searched from database with after the formatting Data message matching at least one words of description.
The sentence of generation is read for the ease of user, the data message collected need to be used conventional or user's custom Description language is described, and one or more descriptions corresponding with the data message after each formatting are stored in database Word, therefore, for the data message of each formatting, the data message with each formatting can be searched from the database At least one words of description of matching.
Such as:
1st, the temporal information collected is 6:50AM, the words of description collection found are combined into:
Morning, and early morning, morning 6: 50 Beijing time, 6:50AM, early in the morning }.
2nd, the GPS information collected is { longitude=22.04, dimension=114.3 }, and the words of description collection found is combined into:
{ Shenzhen Huawei base, Longgang District sakata, Wuhe Avenue }
3rd, the message registration information collected<Time, this mobile phone state, other side's mobile phone state, this mobile phone sets state, right Fang Shouji ID>, it is combined into for the words of description collection of call action:{ call, make a phone call, answer the call };For retouching for conversation object Stating set of words is:I, John (contact person) }.
4th, the acceleration information for collecting<Time, x-axis acceleration, y-axis acceleration, z-axis acceleration>, words of description Gathering to be:
{ walking, take a walk, jog }.
Step S205, according at least one words of description matched with the data message after the formatting, it is determined that each Sentence element of the data message in sentence to be formed after the formatting.
By each data message collected be formatted and the matching of words of description after, system is to these descriptors Language can determine the probability of the sentence element of the words of description before or is defined as corresponding sentence according to use habit Composition, the type of sentence element include subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as collection terminal Temporal information, the temporal information can be defined as time adverbial, for the information collected from GPS, the information can known Wei not point adverbial etc..
Step S206, for the data message after each formatting, matched according to the data message after the formatting The probability that uses in the database of at least one words of description, matched from the data message after the formatting At least one words of description in select a words of description.
Before generated statement, for each sentence element used by words of description typically only select one, therefore, it is necessary to A words of description is selected in the multiple words of description matched with the data message after formatting, the foundation of the selection can be The probability that these words of description use in database, that is, the probability of generated statement is selected to, or can also be based on using Family speech habits.
Step S207, according to the class of sentence element of at least one data message of determination in sentence to be formed Type, the sentence structure of the type of sentence element of the selection comprising at least one data message from syntax structural library.
Various sentence structures are stored in syntactic structure storehouse, contained in every kind of sentence structure one or more sentences into Point, each sentence element has corresponding position in the sentence structure.Selection collects comprising all from syntax structural library Data message corresponding to sentence element sentence structure.
For example, the syntactic structure included in syntactic structure storehouse has:
[time adverbial] [subject] [point adverbial] [predicate] [object];
[subject] [predicate] [object];Deng.
Step S208, will according to position of the sentence element of at least one data message in the sentence structure The words of description matched with the data message after at least one formatting the composition sentence of selection.
After have selected sentence structure, according to sentence element corresponding to the data message after each formatting in the sentence structure In position, the words of description matched with the data message of selection is filled into the position, filled one by one each sentence into The position divided, that is, constitute a sentence.
For example, according to citing above, following sentence can be formed:
" early morning, I and John converse.”
" in morning, I takes a walk in Wuhe Avenue, and John phones me.”
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Fig. 3 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1 Cheng Tu.As shown in figure 3, this method comprises the following steps:
Step S301, at least one data message of acquisition terminal.
Step S302, detect the source of at least one data message.
Step S303, according to the source of at least one data message, according to form corresponding with the source, by institute State at least one data message to be formatted, obtain the data message after at least one formatting.
Step S304, for the data message after each formatting, searched from database with after the formatting Data message matching at least one words of description.
Step S305, according at least one words of description matched with the data message after the formatting, it is determined that each Sentence element of the data message in sentence to be formed after the formatting.
Step S306, for the data message after each formatting, matched according to the data message after the formatting The probability that uses in the database of at least one words of description, matched from the data message after the formatting At least one words of description in select a words of description.
Step S307, according to sentence element of at least one data message of determination in sentence to be formed, it will select The words of description matched with the data message after at least one formatting selected and the sentence progress in statement model storehouse Match somebody with somebody.
Step S308, obtain the sentence after the matching.
The difference of the present embodiment and above-described embodiment is:Step S307 and the step of step S308 and above-described embodiment Rapid S207 and step S208 is different.
The definition of language model is that " language model is generally constructed with character string s probability distribution P (s), here P (s) Attempt to reflect the probability that character string s occurs as a sentence.”
In n gram language models, sentence s=W1, W2 ... Wn, its probability calculation formula can be expressed as:
P(s)=P(W1)P(W2|W1)P(W3|W1W2)…P(Wn|W1…Wn-1)
In the present embodiment, various sentences are stored in statement model storehouse, will generated statement the data with formatting The words of description of information matches is matched with the sentence in statement model storehouse, obtains the sentence after matching.
Specifically, for example, storing sentence 1 in statement model storehouse:" morning, Lyn phoned me ", then it is assumed that lift above The words of description of generated statement and sentence element is wanted to be matched with the sentence 1 in example, then the sentence after being matched is " early Morning, John phoned me ".
Sentence 2 " early morning, I and Lily converse " may be also stored in statement model storehouse, then it is assumed that wanted in illustrating above The words of description and sentence element of generated statement can also match with the sentence 2, but the language being made up of the words of description of sentence 1 The probability that occurs in the diary text of generation of sentence 1 is 54%, and by the sentence 2 that the words of description of sentence 2 is formed generation day The probability occurred in note text is 30%, then selects the progress of probability highest sentence 1 with occurring in the diary text of generation Match somebody with somebody, obtain the sentence after matching.
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Fig. 4 is a kind of structural representation of one embodiment of sentence generating means of the present invention.As shown in figure 4, the device 1000 include:
Collector unit 11, at least one data message of collection terminal, wherein, the data message includes the end At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro- Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces. The collector unit 11 of the present invention can unify these data messages of collection terminal and be arranged.
Determining unit 12, for determining each data message of at least one data message in sentence to be formed Sentence element.
For each data message of collection, determining unit 12 is defined as corresponding sentence element, sentence element Type includes subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as the temporal information for collection terminal, can be with The temporal information is defined as time adverbial, for the information collected from GPS, the information can be defined as to point adverbial etc..
Component units 13, for according to sentence of at least one data message of determination in sentence to be formed into Point, at least one data message is formed into sentence.
After the sentence element of each data message for identifying collection, the can of component units 13 is according to these data messages Corresponding sentence element, matched according to certain sentence structure or with some language models, obtain these data message groups Into sentence, so as to by one or more sentences carry out complete description to the content that these data messages include.Accumulation Sentence i.e. form automatic diary text.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Fig. 5 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4 Structure schematic diagram.As shown in figure 5, the device 2000 includes:
Collector unit 21, at least one data message of collection terminal, wherein, the data message includes the end At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
In the present embodiment, collector unit 21 includes collection subelement 211, detection sub-unit 212 and formats subelement 213。
Gather subelement 211, at least one data message for acquisition terminal.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes This computer, tablet personal computer, mobile phone etc..Collection subelement 211 can collect various data messages from a terminal, including: The terminal operation information of itself, such as network connection information, system process information etc.;The user's operation information of the terminal, example Such as sensor information, microblogging;The information that the terminal receives from external interface, such as call-information, short message, GPS letter Breath etc., these data messages include text data, such as microblogging, short message, and text information can be directly extracted from these information; Also include non-text data, such as network connection information, system process information, sensor information etc., these are by interface etc. The data message of collection.
Detection sub-unit 212, for detecting the source of at least one data message.
Detection sub-unit 212 detects the source of these data messages collected, and these sources are:If the information is GPS information, then the source is the GPS in terminal;If sensor information, then the source is some sensor in terminal; , then can be according to software program identifier if call-information, application program (Application, APP) information is such as microblogging Source.
Subelement 213 is formatted, for the source according at least one data message, according to corresponding with the source Form, at least one data message is formatted, obtains the data message after at least one formatting.
To gathering the data message from separate sources, subelement 213 need to be formatted and carry out arranging this in different formats A little data messages, in order to subsequent use.
Can there are a variety of, such as the representation such as tuple, this hair to the form that the data message collected is formatted It is bright including but not limited to above example.
Determining unit 22, for determining each data message of at least one data message in sentence to be formed Sentence element.
In the present embodiment, determining unit 22 includes searching subelement 221 and determination subelement 222.
Search subelement 221, for for the data message after each formatting, searched from database with it is described At least one words of description of data message matching after formatting.
The sentence of generation is read for the ease of user, the data message collected need to be used conventional or user's custom Description language is described, and one or more descriptions corresponding with the data message after each formatting are stored in database Word, therefore, for the data message of each formatting, searching subelement 221 can search and each lattice from the database At least one words of description of the data message matching of formula.
Determination subelement 222, at least one words of description matched for basis with the data message after the formatting, It is determined that sentence element of the data message after each formatting in sentence to be formed.
By each data message collected be formatted and the matching of words of description after, determination subelement 222 is right These words of description can determine the probability of the sentence element of the words of description before or are defined as according to use habit Corresponding sentence element, the type of sentence element include subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as right In the temporal information of collection terminal, the temporal information can be defined as time adverbial, can be with for the information collected from GPS The information is defined as point adverbial etc..
Selecting unit 23, for for the data message after each formatting, believing according to the data after the formatting The probability that at least one words of description of breath matching uses in the database, believe from the data after the formatting Cease and a words of description is selected at least one words of description of matching.
Before generated statement, for each sentence element used by words of description typically only select one, therefore, selection Unit 23 needs to select a words of description in the multiple words of description matched with the data message after formatting, the selection Foundation can be the probability that these words of description use in database, that is, be selected to the probability of generated statement, Huo Zheye User language can be based on to be accustomed to.
Component units 24, for according to sentence of at least one data message of determination in sentence to be formed into Point, at least one data message is formed into sentence.
In the present embodiment, component units 24 include selection subelement 241 and composition subelement 242.
Subelement 241 is selected, for the type of the sentence element according at least one data message, from syntactic structure The sentence structure of the type of sentence element of the selection comprising at least one data message in storehouse.
Various sentence structures are stored in syntactic structure storehouse, contained in every kind of sentence structure one or more sentences into Point, each sentence element has corresponding position in the sentence structure.Selection subelement 241 selects bag from syntax structural library Sentence structure containing sentence element corresponding to all data messages collected.
Subelement 242 is formed, for the sentence element according at least one data message in the sentence structure Position, the words of description that is matched with the data message after at least one formatting of selection is formed into sentence.
After have selected sentence structure, composition subelement 242 according to sentence corresponding to the data message after each formatting into Divide the position in the sentence structure, the words of description matched with the data message of selection is filled into the position, filled out one by one The position of each sentence element is charged, that is, constitutes a sentence.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Fig. 6 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4 Structure schematic diagram.As shown in fig. 6, the device 3000 includes:
Collector unit 31, at least one data message of collection terminal, wherein, the data message includes the end At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
In the present embodiment, collector unit 31 includes collection subelement 311, detection sub-unit 312 and formats subelement 313。
Gather subelement 311, at least one data message for acquisition terminal.
Detection sub-unit 312, for detecting the source of at least one data message.
Subelement 313 is formatted, for the source according at least one data message, according to corresponding with the source Form, at least one data message is formatted, obtains the data message after at least one formatting.
Determining unit 32, for determining each data message of at least one data message in sentence to be formed Sentence element.
In the present embodiment, determining unit 32 includes searching subelement 321 and determination subelement 322.
Search subelement 321, for for the data message after each formatting, searched from database with it is described At least one words of description of data message matching after formatting.
Determination subelement 322, at least one words of description matched for basis with the data message after the formatting, It is determined that sentence element of the data message after each formatting in sentence to be formed.
Selecting unit 33, for for the data message after each formatting, believing according to the data after the formatting The probability that at least one words of description of breath matching uses in the database, believe from the data after the formatting Cease and a words of description is selected at least one words of description of matching.
Component units 34, for according to sentence of at least one data message of determination in sentence to be formed into Point, at least one data message is formed into sentence.
In the present embodiment, component units 34 include coupling subelement 341 and obtain subelement 342.
Coupling subelement 341, for the sentence according at least one data message of determination in sentence to be formed Composition, by the language in the words of description matched with the data message after at least one formatting of selection and statement model storehouse Sentence is matched.
Subelement 342 is obtained, for obtaining the sentence after the matching.
The difference of the present embodiment and above-described embodiment is:The component units 24 of component units 34 and above-described embodiment It is different.
The definition of language model is that " language model is generally constructed with character string s probability distribution P (s), here P (s) Attempt to reflect the probability that character string s occurs as a sentence.”
In n gram language models, sentence s=W1, W2 ... Wn, its probability calculation formula can be expressed as:
P(s)=P(W1)P(W2|W1)P(W3|W1W2)…P(Wn|W1…Wn-1)
In the present embodiment, store various sentences in statement model storehouse, coupling subelement 341 will generated statement with The words of description of the data message matching of formatting is matched with the sentence in statement model storehouse, is obtained subelement 342 and is obtained Sentence after matching.
Specifically, for example, storing sentence 1 in statement model storehouse:" morning, Lyn phoned me ", then it is assumed that lift above The words of description of generated statement and sentence element is wanted to be matched with the sentence 1 in example, then the sentence after being matched is " early Morning, John phoned me ".
Sentence 2 " early morning, I and Lily converse " may be also stored in statement model storehouse, then it is assumed that wanted in illustrating above The words of description and sentence element of generated statement can also match with the sentence 2, but form sentence by the words of description of sentence 1 1 probability occurred in the diary text of generation is 54%, and forms diary text of the sentence 2 in generation by the words of description of sentence 2 The probability occurred in this is 30%, then selection is matched with the probability highest sentence 1 occurred in the diary text of generation, Obtain the sentence after matching.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities Or event is recorded automatically.
Above disclosure is only preferred embodiment of present invention, can not limit the right model of the present invention with this certainly Enclose, therefore the equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (6)

  1. A kind of 1. sentence generation method, it is characterised in that including:
    At least one data message of collection terminal, wherein, the data message includes the operation information of the terminal, the end At least one of information that the operation information at end and the terminal receive from external interface;
    Determine sentence element of each data message of at least one data message in sentence to be formed;
    Specifically, at least one data message of the collection terminal, including:
    At least one data message of acquisition terminal;
    Detect the source of at least one data message;
    According to the source of at least one data message, according to form corresponding with the source, by least one number It is believed that breath is formatted, the data message after at least one formatting is obtained;
    Sentence element of each data message for determining at least one data message in sentence to be formed, including:
    For the data message after each formatting, search from database and matched with the data message after the formatting At least one words of description;
    According at least one words of description matched with the data message after the formatting, it is determined that after each formatting Sentence element of the data message in sentence to be formed;
    Methods described also includes:
    It is described at least one according to being matched with the data message after the formatting for the data message after each formatting The probability that words of description uses in the database, it is described at least one from being matched with the data message after the formatting A words of description is selected in words of description;
    According to sentence element of at least one data message of determination in sentence to be formed, by least one data Information forms sentence.
  2. 2. the method as described in claim 1, it is characterised in that at least one data message according to determination is being treated The sentence element in sentence is formed, at least one data message is formed into sentence, including:
    According to the type of sentence element of at least one data message of determination in sentence to be formed, from syntax structural library The sentence structure of the type of middle sentence element of the selection comprising at least one data message;
    According to position of the sentence element of at least one data message in the sentence structure, by selection with it is described extremely The words of description composition sentence of data message matching after a few formatting.
  3. 3. the method as described in claim 1, it is characterised in that at least one data message according to determination is being treated The sentence element in sentence is formed, at least one data message is formed into sentence, including:
    According to sentence element of at least one data message of determination in sentence to be formed, by selection with it is described at least The words of description of data message matching after one formatting is matched with the sentence in statement model storehouse;
    Obtain the sentence after the matching.
  4. A kind of 4. sentence generating means, it is characterised in that including:
    Collector unit, at least one data message of collection terminal, wherein, the data message includes the fortune of the terminal Row information, the terminal operation information and at least one of the information that is received from external interface of the terminal;
    Determining unit, for determine sentence of each data message of at least one data message in sentence to be formed into Point;
    Specifically, the collector unit includes:
    Gather subelement, at least one data message for acquisition terminal;
    Detection sub-unit, for detecting the source of at least one data message;
    Subelement is formatted, for the source according at least one data message, according to form corresponding with the source, At least one data message is formatted, obtains the data message after at least one formatting;
    The determining unit includes:
    Subelement is searched, for for the data message after each formatting, being searched and the formatting from database At least one words of description of data message matching afterwards;
    Determination subelement, at least one words of description matched for basis with the data message after the formatting, it is determined that often Sentence element of the data message in sentence to be formed after the individual formatting;
    Described device also includes:
    Selecting unit, for for the data message after each formatting, being matched according to the data message after the formatting The probability that uses in the database of at least one words of description, matched from the data message after the formatting At least one words of description in select a words of description;
    Component units, for the sentence element according at least one data message of determination in sentence to be formed, by institute State at least one data message composition sentence.
  5. 5. device as claimed in claim 4, it is characterised in that the component units include:
    Subelement is selected, for the type of the sentence element according at least one data message, is selected from syntax structural library Select the sentence structure of the type of the sentence element comprising at least one data message;
    Subelement is formed, for position of the sentence element according at least one data message in the sentence structure, The words of description matched with the data message after at least one formatting of selection is formed into sentence.
  6. 6. device as claimed in claim 4, it is characterised in that the component units include:
    Coupling subelement, will for the sentence element according at least one data message of determination in sentence to be formed The words of description matched with the data message after at least one formatting of selection is carried out with the sentence in statement model storehouse Matching;
    Subelement is obtained, for obtaining the sentence after the matching.
CN201310440040.3A 2013-09-24 2013-09-24 A kind of sentence generation method and device Active CN104462145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310440040.3A CN104462145B (en) 2013-09-24 2013-09-24 A kind of sentence generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310440040.3A CN104462145B (en) 2013-09-24 2013-09-24 A kind of sentence generation method and device

Publications (2)

Publication Number Publication Date
CN104462145A CN104462145A (en) 2015-03-25
CN104462145B true CN104462145B (en) 2018-04-10

Family

ID=52908200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310440040.3A Active CN104462145B (en) 2013-09-24 2013-09-24 A kind of sentence generation method and device

Country Status (1)

Country Link
CN (1) CN104462145B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107484038A (en) * 2017-08-22 2017-12-15 北京奇艺世纪科技有限公司 A kind of generation method of video subject, device and electronic equipment
CN110399499B (en) * 2019-07-18 2022-02-18 珠海格力电器股份有限公司 Corpus generation method and device, electronic equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118182A (en) * 2013-01-17 2013-05-22 广东欧珀移动通信有限公司 Method to record application diaries of movable terminal and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007172490A (en) * 2005-12-26 2007-07-05 Sony Computer Entertainment Inc Information processing method, information processing system, and server

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103118182A (en) * 2013-01-17 2013-05-22 广东欧珀移动通信有限公司 Method to record application diaries of movable terminal and device

Also Published As

Publication number Publication date
CN104462145A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN105224586B (en) retrieving context from previous sessions
CN109522419B (en) Session information completion method and device
JP6689515B2 (en) Method and apparatus for identifying the type of user geographic location
Musaev et al. LITMUS: a multi-service composition system for landslide detection
CN107589855B (en) Method and device for recommending candidate words according to geographic positions
CN102708453B (en) The method and device of solution of terminal fault is provided
CN106302933B (en) Voice information processing method and terminal
CN108011928A (en) A kind of information-pushing method, terminal device and computer-readable medium
WO2010148803A1 (en) Method and device for improving access speed of mobile portal website dynamic page
CN103249034A (en) Method and device for acquiring contact information
WO2018187131A1 (en) Automatic narrative creation for captured content
WO2015192447A1 (en) Method, device and terminal for data processing
JPWO2015102082A1 (en) Terminal device, program, and server device for providing information in response to user data input
CN106843817A (en) A kind of intelligent display method and device of mobile terminal desktop component
EP2908562B1 (en) Address book information service system, and method and device for address book information service therein
CN103488525A (en) Determination of user preference relevant to scene
JP2015018288A (en) Information processing system, information processing method, information processing program, and information processor
CN104462145B (en) A kind of sentence generation method and device
CN102902711A (en) Method and device for generating and applying pragmatic keyword conventional template
CN106446270A (en) Classifying method and device
CN103024124A (en) Contact list searching method and contact list searching device
CN113422862B (en) Strange number automatic marking method, system, terminal and storage medium
CN104978366A (en) Voice data index building method and system based on mobile terminal
CN110248018A (en) The call method and Related product of intelligent secretary
CN104077287B (en) A kind of information processing method and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant