CN104462145B - A kind of sentence generation method and device - Google Patents
A kind of sentence generation method and device Download PDFInfo
- Publication number
- CN104462145B CN104462145B CN201310440040.3A CN201310440040A CN104462145B CN 104462145 B CN104462145 B CN 104462145B CN 201310440040 A CN201310440040 A CN 201310440040A CN 104462145 B CN104462145 B CN 104462145B
- Authority
- CN
- China
- Prior art keywords
- data message
- sentence
- formatting
- words
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of sentence generation method and device.This method includes:At least one data message of collection terminal, wherein, the data message includes at least one of the operation information of the terminal, information that the operation information of the terminal and the terminal receive from external interface;Determine sentence element of each data message of at least one data message in sentence to be formed;According to sentence element of at least one data message of determination in sentence to be formed, at least one data message is formed into sentence.Also disclose corresponding device.Using the technical scheme of a kind of sentence generation method and device of the present invention, sentence can be automatically generated according to the various data messages of terminal, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to be recorded automatically by these activities of terminal-pair or events.
Description
Technical field
The present invention relates to language technology field, and in particular to a kind of sentence generation method and device.
Background technology
Automatic diary on intelligent terminal can save the event cost that people record event, and from multiple dimensions and visual angle
The context that record event occurs, can objectively recurring events, meanwhile, the popularization of intelligent terminal carries for the generation of automatic diary
The available information sources and data basis of a variety of dimensions are supplied.However, a kind of generation method of automatic diary of the prior art, its
Source data is mainly text data, such as blog information, social network information, short message, associated person information, from above text
Extraction feature in information, diary is generated, but when source data lacks text and described, will be unable to generate diary;It is of the prior art
The generation method of another automatic diary is analysis mobile phone service condition and sensing data, (is such as opened with reference to mobile phone operation event
Shutdown, receiving and dispatching mail etc.) with the corresponding relation of User Activity, identify the event of User Activity or generation, in chronological order finally
One day event occurred of tissue, diary is generated, the diary of this method generation, its content is very simple, and form is the " time:Event "
Sequence, information content is deficient, and does not use complete sentence to describe User Activity or event, readable poor.
In summary, how according to the various data messages of terminal sentence is automatically generated, is fully described by with sentence at end
The activity occurred on end or event turn into the problem of industry is in the urgent need to address.
The content of the invention
In view of this, the invention provides a kind of sentence generation method and device, to be believed according to the various data of terminal
Breath automatically generates sentence, and the activity or the event that occur in terminal is fully described by with sentence.
First aspect, there is provided a kind of sentence generation method, including:
At least one data message of collection terminal, wherein, the data message includes the operation information of the terminal, institute
State at least one of information that the operation information of terminal and the terminal receive from external interface;
Determine sentence element of each data message of at least one data message in sentence to be formed;
, will be described at least one according to sentence element of at least one data message of determination in sentence to be formed
Data message forms sentence.
In the first possible implementation, at least one data message of the collection terminal, including:
At least one data message of acquisition terminal;
Detect the source of at least one data message;
According to the source of at least one data message, according to form corresponding with the source, at least one by described in
Individual data message is formatted, and obtains the data message after at least one formatting;
Sentence element of each data message for determining at least one data message in sentence to be formed, bag
Include:
For the data message after each formatting, searched from database and the data message after the formatting
At least one words of description of matching;
According at least one words of description matched with the data message after the formatting, it is determined that each formatting
Sentence element of the data message afterwards in sentence to be formed.
With reference to the first possible implementation of first aspect, in second of possible implementation, the basis
At least one words of description matched with the data message after the formatting, it is determined that each data after each formatting
After sentence element of the information in sentence to be formed, and at least one data message according to determination is treating group
Into the sentence element in sentence, before at least one data message composition sentence, methods described also includes:
For the data message after each formatting, according to being matched with the data message after the formatting at least
The probability that one words of description uses in the database, described in matched with the data message after the formatting at least
A words of description is selected in one words of description.
With reference to second of possible implementation of first aspect, in the third possible implementation, the basis
Sentence element of at least one data message determined in sentence to be formed, at least one data message is formed
Sentence, including:
According to the type of sentence element of at least one data message of determination in sentence to be formed, from syntax knot
The sentence structure of the type of sentence element of the selection comprising at least one data message in structure storehouse;
According to position of the sentence element of at least one data message in the sentence structure, by selection and institute
State the words of description composition sentence of the matching of the data message after at least one formatting.
With reference to second of possible implementation of first aspect, in the 4th kind of possible implementation, the basis
Sentence element of at least one data message determined in sentence to be formed, at least one data message is formed
Sentence, including:
According to sentence element of at least one data message of determination in sentence to be formed, by selection with it is described
The words of description of data message matching after at least one formatting is matched with the sentence in statement model storehouse;
Obtain the sentence after the matching.
Second aspect, there is provided a kind of sentence generating means, including:
Collector unit, at least one data message of collection terminal, wherein, the data message includes the terminal
Operation information, the operation information and at least one of the information that is received from external interface of the terminal of the terminal;
Determining unit, for determining sentence of each data message of at least one data message in sentence to be formed
Subconstiuent;
Component units, for the sentence element according at least one data message of determination in sentence to be formed,
By at least one data message composition sentence.
In the first possible implementation, the collector unit includes:
Gather subelement, at least one data message for acquisition terminal;
Detection sub-unit, for detecting the source of at least one data message;
Subelement is formatted, for the source according at least one data message, according to corresponding with the source
Form, at least one data message is formatted, obtains the data message after at least one formatting;
The determining unit includes:
Subelement is searched, for for the data message after each formatting, being searched and the lattice from database
At least one words of description of data message matching after formula;
Determination subelement, at least one words of description matched for basis with the data message after the formatting, really
Sentence element of the data message in sentence to be formed after fixed each formatting.
With reference to the first possible implementation of second aspect, in second of possible implementation, described device
Also include:
Selecting unit, for for the data message after each formatting, according to the data message after the formatting
The probability that uses in the database of at least one words of description of matching, from the data message after the formatting
A words of description is selected at least one words of description of matching.
With reference to second of possible implementation of second aspect, in the third possible implementation, the composition
Unit includes:
Subelement is selected, for the type of the sentence element according at least one data message, from syntax structural library
The sentence structure of the type of middle sentence element of the selection comprising at least one data message;
Subelement is formed, for position of the sentence element according at least one data message in the sentence structure
Put, the words of description matched with the data message after at least one formatting of selection is formed into sentence.
With reference to second of possible implementation of second aspect, in the 4th kind of possible implementation, the composition
Unit includes:
Coupling subelement, for according to sentence of at least one data message of determination in sentence to be formed into
Point, by the sentence in the words of description matched with the data message after at least one formatting of selection and statement model storehouse
Matched;
Subelement is obtained, for obtaining the sentence after the matching.
Using the technical scheme of a kind of sentence generation method and device of the present invention, can be believed according to the various data of terminal
Breath automatically generates sentence, and the activity or the event that occur in terminal is fully described by with sentence, facilitate user by terminal-pair this
A little activities or event are recorded automatically.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of one embodiment of sentence generation method of the present invention;
Fig. 2 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1
Cheng Tu;
Fig. 3 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1
Cheng Tu;
Fig. 4 is a kind of structural representation of one embodiment of sentence generating means of the present invention;
Fig. 5 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4
Structure schematic diagram;
Fig. 6 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4
Structure schematic diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
Fig. 1 is a kind of flow chart of one embodiment of sentence generation method of the present invention.As shown in figure 1, this method includes
Following steps:
Step S101, at least one data message of collection terminal, wherein, the data message includes the fortune of the terminal
At least one of information that row information, the operation information of the terminal and the terminal receive from external interface.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes
This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself
Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro-
Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter
Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number
According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces.
The present invention can unify these data messages of collection terminal and be arranged.
Step S102, determine sentence of each data message of at least one data message in sentence to be formed into
Point.
For each data message of collection, it is defined as corresponding sentence element, the type of sentence element includes master
Language, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as the temporal information for collection terminal, the time can be believed
Breath is defined as time adverbial, and for the information collected from GPS, the information can be identified as to point adverbial etc..
Step S103, according to sentence element of at least one data message of determination in sentence to be formed, by institute
State at least one data message composition sentence.
After the sentence element of each data message for identifying collection, it is possible to according to sentence corresponding to these data messages
Composition, matched according to certain sentence structure or according to language model, obtain the sentence of these data messages composition, so as to
Complete description has been carried out to the content that these data messages include by one or more sentences.The sentence of accumulation forms
Automatic diary text.
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Fig. 2 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1
Cheng Tu.As shown in Fig. 2 this method comprises the following steps:
Step S201, at least one data message of acquisition terminal.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes
This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself
Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro-
Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter
Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number
According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces.
Step S202, detect the source of at least one data message.
The source of these data messages collected is detected, these sources are:If the information is GPS information, this comes
Source is the GPS in terminal;If sensor information, then the source is some sensor in terminal;If call-information,
Application program (Application, APP) information, then can be according to software program identifier source such as microblogging.
Step S203, according to the source of at least one data message, according to form corresponding with the source, by institute
State at least one data message to be formatted, obtain the data message after at least one formatting.
To gathering the data message from separate sources, need to carry out arranging these data messages in different formats, so as to
In subsequent use.
Such as:
1st, micro-blog information:For the microblogging of a certain moment user issue, every microblogging is represented by after formatting:<Time,
Content of microblog, ID>Triple.
2nd, GPS information:For the positional information at a certain moment, every GPS information is represented by after formatting:
<Time, longitude, dimension, height>Four-tuple.
3rd, acceleration information:For the acceleration information at a certain moment, every acceleration information is represented by after formatting:
<Time, x-axis acceleration, y-axis acceleration, z-axis acceleration>Four-tuple.
4th, call-information:For information service conditions such as call, short messages, specifically include:
Call:Converse the time started, the end of conversation time, the duration of call, caller, be called, the phone miss times.
Short message:Short message receives the time, receives short message length, the short message sending time, sends short message length.
Every call-information is represented by after formatting:
<Time, this mobile phone state, other side's mobile phone state, this mobile phone set state, other side's mobile phone ID>Five-tuple
Such as the machine is connected to incoming call and is represented by:
<Time, incoming call is connected to, called, mobile phone jingle bell, other side's mobile phone ID>
Can have to the form that the data message collected is formatted a variety of, above example is only listed based on tuple
Representation, the present invention including but not limited to above example.
Step S204, for the data message after each formatting, searched from database with after the formatting
Data message matching at least one words of description.
The sentence of generation is read for the ease of user, the data message collected need to be used conventional or user's custom
Description language is described, and one or more descriptions corresponding with the data message after each formatting are stored in database
Word, therefore, for the data message of each formatting, the data message with each formatting can be searched from the database
At least one words of description of matching.
Such as:
1st, the temporal information collected is 6:50AM, the words of description collection found are combined into:
Morning, and early morning, morning 6: 50 Beijing time, 6:50AM, early in the morning }.
2nd, the GPS information collected is { longitude=22.04, dimension=114.3 }, and the words of description collection found is combined into:
{ Shenzhen Huawei base, Longgang District sakata, Wuhe Avenue }
3rd, the message registration information collected<Time, this mobile phone state, other side's mobile phone state, this mobile phone sets state, right
Fang Shouji ID>, it is combined into for the words of description collection of call action:{ call, make a phone call, answer the call };For retouching for conversation object
Stating set of words is:I, John (contact person) }.
4th, the acceleration information for collecting<Time, x-axis acceleration, y-axis acceleration, z-axis acceleration>, words of description
Gathering to be:
{ walking, take a walk, jog }.
Step S205, according at least one words of description matched with the data message after the formatting, it is determined that each
Sentence element of the data message in sentence to be formed after the formatting.
By each data message collected be formatted and the matching of words of description after, system is to these descriptors
Language can determine the probability of the sentence element of the words of description before or is defined as corresponding sentence according to use habit
Composition, the type of sentence element include subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as collection terminal
Temporal information, the temporal information can be defined as time adverbial, for the information collected from GPS, the information can known
Wei not point adverbial etc..
Step S206, for the data message after each formatting, matched according to the data message after the formatting
The probability that uses in the database of at least one words of description, matched from the data message after the formatting
At least one words of description in select a words of description.
Before generated statement, for each sentence element used by words of description typically only select one, therefore, it is necessary to
A words of description is selected in the multiple words of description matched with the data message after formatting, the foundation of the selection can be
The probability that these words of description use in database, that is, the probability of generated statement is selected to, or can also be based on using
Family speech habits.
Step S207, according to the class of sentence element of at least one data message of determination in sentence to be formed
Type, the sentence structure of the type of sentence element of the selection comprising at least one data message from syntax structural library.
Various sentence structures are stored in syntactic structure storehouse, contained in every kind of sentence structure one or more sentences into
Point, each sentence element has corresponding position in the sentence structure.Selection collects comprising all from syntax structural library
Data message corresponding to sentence element sentence structure.
For example, the syntactic structure included in syntactic structure storehouse has:
[time adverbial] [subject] [point adverbial] [predicate] [object];
[subject] [predicate] [object];Deng.
Step S208, will according to position of the sentence element of at least one data message in the sentence structure
The words of description matched with the data message after at least one formatting the composition sentence of selection.
After have selected sentence structure, according to sentence element corresponding to the data message after each formatting in the sentence structure
In position, the words of description matched with the data message of selection is filled into the position, filled one by one each sentence into
The position divided, that is, constitute a sentence.
For example, according to citing above, following sentence can be formed:
" early morning, I and John converse.”
" in morning, I takes a walk in Wuhe Avenue, and John phones me.”
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Fig. 3 is the stream of another embodiment of the further refinement to a kind of sentence generation method of the invention shown in Fig. 1
Cheng Tu.As shown in figure 3, this method comprises the following steps:
Step S301, at least one data message of acquisition terminal.
Step S302, detect the source of at least one data message.
Step S303, according to the source of at least one data message, according to form corresponding with the source, by institute
State at least one data message to be formatted, obtain the data message after at least one formatting.
Step S304, for the data message after each formatting, searched from database with after the formatting
Data message matching at least one words of description.
Step S305, according at least one words of description matched with the data message after the formatting, it is determined that each
Sentence element of the data message in sentence to be formed after the formatting.
Step S306, for the data message after each formatting, matched according to the data message after the formatting
The probability that uses in the database of at least one words of description, matched from the data message after the formatting
At least one words of description in select a words of description.
Step S307, according to sentence element of at least one data message of determination in sentence to be formed, it will select
The words of description matched with the data message after at least one formatting selected and the sentence progress in statement model storehouse
Match somebody with somebody.
Step S308, obtain the sentence after the matching.
The difference of the present embodiment and above-described embodiment is:Step S307 and the step of step S308 and above-described embodiment
Rapid S207 and step S208 is different.
The definition of language model is that " language model is generally constructed with character string s probability distribution P (s), here P (s)
Attempt to reflect the probability that character string s occurs as a sentence.”
In n gram language models, sentence s=W1, W2 ... Wn, its probability calculation formula can be expressed as:
P(s)=P(W1)P(W2|W1)P(W3|W1W2)…P(Wn|W1…Wn-1)
In the present embodiment, various sentences are stored in statement model storehouse, will generated statement the data with formatting
The words of description of information matches is matched with the sentence in statement model storehouse, obtains the sentence after matching.
Specifically, for example, storing sentence 1 in statement model storehouse:" morning, Lyn phoned me ", then it is assumed that lift above
The words of description of generated statement and sentence element is wanted to be matched with the sentence 1 in example, then the sentence after being matched is " early
Morning, John phoned me ".
Sentence 2 " early morning, I and Lily converse " may be also stored in statement model storehouse, then it is assumed that wanted in illustrating above
The words of description and sentence element of generated statement can also match with the sentence 2, but the language being made up of the words of description of sentence 1
The probability that occurs in the diary text of generation of sentence 1 is 54%, and by the sentence 2 that the words of description of sentence 2 is formed generation day
The probability occurred in note text is 30%, then selects the progress of probability highest sentence 1 with occurring in the diary text of generation
Match somebody with somebody, obtain the sentence after matching.
A kind of sentence generation method provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Fig. 4 is a kind of structural representation of one embodiment of sentence generating means of the present invention.As shown in figure 4, the device
1000 include:
Collector unit 11, at least one data message of collection terminal, wherein, the data message includes the end
At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes
This computer, tablet personal computer, mobile phone etc..Various data messages can be collected into from a terminal, including:The terminal fortune of itself
Row information, such as network connection information, system process information etc.;The user's operation information of the terminal, for example, it is sensor information, micro-
Win;The information that the terminal receives from external interface, such as call-information, short message, GPS information etc., these data letter
Breath includes text data, such as microblogging, short message, and text information can be directly extracted from these information;Also include non-textual number
According to, such as network connection information, system process information, sensor information etc., these are the data messages by collections such as interfaces.
The collector unit 11 of the present invention can unify these data messages of collection terminal and be arranged.
Determining unit 12, for determining each data message of at least one data message in sentence to be formed
Sentence element.
For each data message of collection, determining unit 12 is defined as corresponding sentence element, sentence element
Type includes subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as the temporal information for collection terminal, can be with
The temporal information is defined as time adverbial, for the information collected from GPS, the information can be defined as to point adverbial etc..
Component units 13, for according to sentence of at least one data message of determination in sentence to be formed into
Point, at least one data message is formed into sentence.
After the sentence element of each data message for identifying collection, the can of component units 13 is according to these data messages
Corresponding sentence element, matched according to certain sentence structure or with some language models, obtain these data message groups
Into sentence, so as to by one or more sentences carry out complete description to the content that these data messages include.Accumulation
Sentence i.e. form automatic diary text.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Fig. 5 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4
Structure schematic diagram.As shown in figure 5, the device 2000 includes:
Collector unit 21, at least one data message of collection terminal, wherein, the data message includes the end
At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
In the present embodiment, collector unit 21 includes collection subelement 211, detection sub-unit 212 and formats subelement
213。
Gather subelement 211, at least one data message for acquisition terminal.
The terminal of the present invention refers to that network contacts to realize the various equipment of network application with end user, such as takes down notes
This computer, tablet personal computer, mobile phone etc..Collection subelement 211 can collect various data messages from a terminal, including:
The terminal operation information of itself, such as network connection information, system process information etc.;The user's operation information of the terminal, example
Such as sensor information, microblogging;The information that the terminal receives from external interface, such as call-information, short message, GPS letter
Breath etc., these data messages include text data, such as microblogging, short message, and text information can be directly extracted from these information;
Also include non-text data, such as network connection information, system process information, sensor information etc., these are by interface etc.
The data message of collection.
Detection sub-unit 212, for detecting the source of at least one data message.
Detection sub-unit 212 detects the source of these data messages collected, and these sources are:If the information is
GPS information, then the source is the GPS in terminal;If sensor information, then the source is some sensor in terminal;
, then can be according to software program identifier if call-information, application program (Application, APP) information is such as microblogging
Source.
Subelement 213 is formatted, for the source according at least one data message, according to corresponding with the source
Form, at least one data message is formatted, obtains the data message after at least one formatting.
To gathering the data message from separate sources, subelement 213 need to be formatted and carry out arranging this in different formats
A little data messages, in order to subsequent use.
Can there are a variety of, such as the representation such as tuple, this hair to the form that the data message collected is formatted
It is bright including but not limited to above example.
Determining unit 22, for determining each data message of at least one data message in sentence to be formed
Sentence element.
In the present embodiment, determining unit 22 includes searching subelement 221 and determination subelement 222.
Search subelement 221, for for the data message after each formatting, searched from database with it is described
At least one words of description of data message matching after formatting.
The sentence of generation is read for the ease of user, the data message collected need to be used conventional or user's custom
Description language is described, and one or more descriptions corresponding with the data message after each formatting are stored in database
Word, therefore, for the data message of each formatting, searching subelement 221 can search and each lattice from the database
At least one words of description of the data message matching of formula.
Determination subelement 222, at least one words of description matched for basis with the data message after the formatting,
It is determined that sentence element of the data message after each formatting in sentence to be formed.
By each data message collected be formatted and the matching of words of description after, determination subelement 222 is right
These words of description can determine the probability of the sentence element of the words of description before or are defined as according to use habit
Corresponding sentence element, the type of sentence element include subject, predicate, object, attribute, complement, the adverbial modifier, predicative etc., such as right
In the temporal information of collection terminal, the temporal information can be defined as time adverbial, can be with for the information collected from GPS
The information is defined as point adverbial etc..
Selecting unit 23, for for the data message after each formatting, believing according to the data after the formatting
The probability that at least one words of description of breath matching uses in the database, believe from the data after the formatting
Cease and a words of description is selected at least one words of description of matching.
Before generated statement, for each sentence element used by words of description typically only select one, therefore, selection
Unit 23 needs to select a words of description in the multiple words of description matched with the data message after formatting, the selection
Foundation can be the probability that these words of description use in database, that is, be selected to the probability of generated statement, Huo Zheye
User language can be based on to be accustomed to.
Component units 24, for according to sentence of at least one data message of determination in sentence to be formed into
Point, at least one data message is formed into sentence.
In the present embodiment, component units 24 include selection subelement 241 and composition subelement 242.
Subelement 241 is selected, for the type of the sentence element according at least one data message, from syntactic structure
The sentence structure of the type of sentence element of the selection comprising at least one data message in storehouse.
Various sentence structures are stored in syntactic structure storehouse, contained in every kind of sentence structure one or more sentences into
Point, each sentence element has corresponding position in the sentence structure.Selection subelement 241 selects bag from syntax structural library
Sentence structure containing sentence element corresponding to all data messages collected.
Subelement 242 is formed, for the sentence element according at least one data message in the sentence structure
Position, the words of description that is matched with the data message after at least one formatting of selection is formed into sentence.
After have selected sentence structure, composition subelement 242 according to sentence corresponding to the data message after each formatting into
Divide the position in the sentence structure, the words of description matched with the data message of selection is filled into the position, filled out one by one
The position of each sentence element is charged, that is, constitutes a sentence.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Fig. 6 is to a kind of knot of another embodiment of the further refinement of sentence generating means of the invention shown in Fig. 4
Structure schematic diagram.As shown in fig. 6, the device 3000 includes:
Collector unit 31, at least one data message of collection terminal, wherein, the data message includes the end
At least one of information that the operation information at end, the operation information of the terminal and the terminal receive from external interface.
In the present embodiment, collector unit 31 includes collection subelement 311, detection sub-unit 312 and formats subelement
313。
Gather subelement 311, at least one data message for acquisition terminal.
Detection sub-unit 312, for detecting the source of at least one data message.
Subelement 313 is formatted, for the source according at least one data message, according to corresponding with the source
Form, at least one data message is formatted, obtains the data message after at least one formatting.
Determining unit 32, for determining each data message of at least one data message in sentence to be formed
Sentence element.
In the present embodiment, determining unit 32 includes searching subelement 321 and determination subelement 322.
Search subelement 321, for for the data message after each formatting, searched from database with it is described
At least one words of description of data message matching after formatting.
Determination subelement 322, at least one words of description matched for basis with the data message after the formatting,
It is determined that sentence element of the data message after each formatting in sentence to be formed.
Selecting unit 33, for for the data message after each formatting, believing according to the data after the formatting
The probability that at least one words of description of breath matching uses in the database, believe from the data after the formatting
Cease and a words of description is selected at least one words of description of matching.
Component units 34, for according to sentence of at least one data message of determination in sentence to be formed into
Point, at least one data message is formed into sentence.
In the present embodiment, component units 34 include coupling subelement 341 and obtain subelement 342.
Coupling subelement 341, for the sentence according at least one data message of determination in sentence to be formed
Composition, by the language in the words of description matched with the data message after at least one formatting of selection and statement model storehouse
Sentence is matched.
Subelement 342 is obtained, for obtaining the sentence after the matching.
The difference of the present embodiment and above-described embodiment is:The component units 24 of component units 34 and above-described embodiment
It is different.
The definition of language model is that " language model is generally constructed with character string s probability distribution P (s), here P (s)
Attempt to reflect the probability that character string s occurs as a sentence.”
In n gram language models, sentence s=W1, W2 ... Wn, its probability calculation formula can be expressed as:
P(s)=P(W1)P(W2|W1)P(W3|W1W2)…P(Wn|W1…Wn-1)
In the present embodiment, store various sentences in statement model storehouse, coupling subelement 341 will generated statement with
The words of description of the data message matching of formatting is matched with the sentence in statement model storehouse, is obtained subelement 342 and is obtained
Sentence after matching.
Specifically, for example, storing sentence 1 in statement model storehouse:" morning, Lyn phoned me ", then it is assumed that lift above
The words of description of generated statement and sentence element is wanted to be matched with the sentence 1 in example, then the sentence after being matched is " early
Morning, John phoned me ".
Sentence 2 " early morning, I and Lily converse " may be also stored in statement model storehouse, then it is assumed that wanted in illustrating above
The words of description and sentence element of generated statement can also match with the sentence 2, but form sentence by the words of description of sentence 1
1 probability occurred in the diary text of generation is 54%, and forms diary text of the sentence 2 in generation by the words of description of sentence 2
The probability occurred in this is 30%, then selection is matched with the probability highest sentence 1 occurred in the diary text of generation,
Obtain the sentence after matching.
A kind of sentence generating means provided according to embodiments of the present invention, can be automatic according to the various data messages of terminal
Generated statement, the activity or the event that occur in terminal is fully described by with sentence, facilitates user to pass through terminal-pair these activities
Or event is recorded automatically.
Above disclosure is only preferred embodiment of present invention, can not limit the right model of the present invention with this certainly
Enclose, therefore the equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.
Claims (6)
- A kind of 1. sentence generation method, it is characterised in that including:At least one data message of collection terminal, wherein, the data message includes the operation information of the terminal, the end At least one of information that the operation information at end and the terminal receive from external interface;Determine sentence element of each data message of at least one data message in sentence to be formed;Specifically, at least one data message of the collection terminal, including:At least one data message of acquisition terminal;Detect the source of at least one data message;According to the source of at least one data message, according to form corresponding with the source, by least one number It is believed that breath is formatted, the data message after at least one formatting is obtained;Sentence element of each data message for determining at least one data message in sentence to be formed, including:For the data message after each formatting, search from database and matched with the data message after the formatting At least one words of description;According at least one words of description matched with the data message after the formatting, it is determined that after each formatting Sentence element of the data message in sentence to be formed;Methods described also includes:It is described at least one according to being matched with the data message after the formatting for the data message after each formatting The probability that words of description uses in the database, it is described at least one from being matched with the data message after the formatting A words of description is selected in words of description;According to sentence element of at least one data message of determination in sentence to be formed, by least one data Information forms sentence.
- 2. the method as described in claim 1, it is characterised in that at least one data message according to determination is being treated The sentence element in sentence is formed, at least one data message is formed into sentence, including:According to the type of sentence element of at least one data message of determination in sentence to be formed, from syntax structural library The sentence structure of the type of middle sentence element of the selection comprising at least one data message;According to position of the sentence element of at least one data message in the sentence structure, by selection with it is described extremely The words of description composition sentence of data message matching after a few formatting.
- 3. the method as described in claim 1, it is characterised in that at least one data message according to determination is being treated The sentence element in sentence is formed, at least one data message is formed into sentence, including:According to sentence element of at least one data message of determination in sentence to be formed, by selection with it is described at least The words of description of data message matching after one formatting is matched with the sentence in statement model storehouse;Obtain the sentence after the matching.
- A kind of 4. sentence generating means, it is characterised in that including:Collector unit, at least one data message of collection terminal, wherein, the data message includes the fortune of the terminal Row information, the terminal operation information and at least one of the information that is received from external interface of the terminal;Determining unit, for determine sentence of each data message of at least one data message in sentence to be formed into Point;Specifically, the collector unit includes:Gather subelement, at least one data message for acquisition terminal;Detection sub-unit, for detecting the source of at least one data message;Subelement is formatted, for the source according at least one data message, according to form corresponding with the source, At least one data message is formatted, obtains the data message after at least one formatting;The determining unit includes:Subelement is searched, for for the data message after each formatting, being searched and the formatting from database At least one words of description of data message matching afterwards;Determination subelement, at least one words of description matched for basis with the data message after the formatting, it is determined that often Sentence element of the data message in sentence to be formed after the individual formatting;Described device also includes:Selecting unit, for for the data message after each formatting, being matched according to the data message after the formatting The probability that uses in the database of at least one words of description, matched from the data message after the formatting At least one words of description in select a words of description;Component units, for the sentence element according at least one data message of determination in sentence to be formed, by institute State at least one data message composition sentence.
- 5. device as claimed in claim 4, it is characterised in that the component units include:Subelement is selected, for the type of the sentence element according at least one data message, is selected from syntax structural library Select the sentence structure of the type of the sentence element comprising at least one data message;Subelement is formed, for position of the sentence element according at least one data message in the sentence structure, The words of description matched with the data message after at least one formatting of selection is formed into sentence.
- 6. device as claimed in claim 4, it is characterised in that the component units include:Coupling subelement, will for the sentence element according at least one data message of determination in sentence to be formed The words of description matched with the data message after at least one formatting of selection is carried out with the sentence in statement model storehouse Matching;Subelement is obtained, for obtaining the sentence after the matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310440040.3A CN104462145B (en) | 2013-09-24 | 2013-09-24 | A kind of sentence generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310440040.3A CN104462145B (en) | 2013-09-24 | 2013-09-24 | A kind of sentence generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104462145A CN104462145A (en) | 2015-03-25 |
CN104462145B true CN104462145B (en) | 2018-04-10 |
Family
ID=52908200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310440040.3A Active CN104462145B (en) | 2013-09-24 | 2013-09-24 | A kind of sentence generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462145B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107484038A (en) * | 2017-08-22 | 2017-12-15 | 北京奇艺世纪科技有限公司 | A kind of generation method of video subject, device and electronic equipment |
CN110399499B (en) * | 2019-07-18 | 2022-02-18 | 珠海格力电器股份有限公司 | Corpus generation method and device, electronic equipment and readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118182A (en) * | 2013-01-17 | 2013-05-22 | 广东欧珀移动通信有限公司 | Method to record application diaries of movable terminal and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007172490A (en) * | 2005-12-26 | 2007-07-05 | Sony Computer Entertainment Inc | Information processing method, information processing system, and server |
-
2013
- 2013-09-24 CN CN201310440040.3A patent/CN104462145B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103118182A (en) * | 2013-01-17 | 2013-05-22 | 广东欧珀移动通信有限公司 | Method to record application diaries of movable terminal and device |
Also Published As
Publication number | Publication date |
---|---|
CN104462145A (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105224586B (en) | retrieving context from previous sessions | |
CN109522419B (en) | Session information completion method and device | |
JP6689515B2 (en) | Method and apparatus for identifying the type of user geographic location | |
Musaev et al. | LITMUS: a multi-service composition system for landslide detection | |
CN107589855B (en) | Method and device for recommending candidate words according to geographic positions | |
CN102708453B (en) | The method and device of solution of terminal fault is provided | |
CN106302933B (en) | Voice information processing method and terminal | |
CN108011928A (en) | A kind of information-pushing method, terminal device and computer-readable medium | |
WO2010148803A1 (en) | Method and device for improving access speed of mobile portal website dynamic page | |
CN103249034A (en) | Method and device for acquiring contact information | |
WO2018187131A1 (en) | Automatic narrative creation for captured content | |
WO2015192447A1 (en) | Method, device and terminal for data processing | |
JPWO2015102082A1 (en) | Terminal device, program, and server device for providing information in response to user data input | |
CN106843817A (en) | A kind of intelligent display method and device of mobile terminal desktop component | |
EP2908562B1 (en) | Address book information service system, and method and device for address book information service therein | |
CN103488525A (en) | Determination of user preference relevant to scene | |
JP2015018288A (en) | Information processing system, information processing method, information processing program, and information processor | |
CN104462145B (en) | A kind of sentence generation method and device | |
CN102902711A (en) | Method and device for generating and applying pragmatic keyword conventional template | |
CN106446270A (en) | Classifying method and device | |
CN103024124A (en) | Contact list searching method and contact list searching device | |
CN113422862B (en) | Strange number automatic marking method, system, terminal and storage medium | |
CN104978366A (en) | Voice data index building method and system based on mobile terminal | |
CN110248018A (en) | The call method and Related product of intelligent secretary | |
CN104077287B (en) | A kind of information processing method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |