CN109858973A - A kind of analysis method of regional tourism industry development - Google Patents
A kind of analysis method of regional tourism industry development Download PDFInfo
- Publication number
- CN109858973A CN109858973A CN201910123321.3A CN201910123321A CN109858973A CN 109858973 A CN109858973 A CN 109858973A CN 201910123321 A CN201910123321 A CN 201910123321A CN 109858973 A CN109858973 A CN 109858973A
- Authority
- CN
- China
- Prior art keywords
- comment
- dimension
- analysis
- data
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of analysis methods of regional tourism industry development, which comprises step 1: determining region to be analyzed;Step 2: it acquires the relevant data acquisition in region to be analyzed and is pre-processed;Step 3: building Training Support Vector Machines model;Step 4: being based on pretreated regional tourism data to be analyzed and Training Support Vector Machines model, and the tourism industry health degree for treating analyzed area is analyzed;Step 5: the tourism industry for being analysed to region is analyzed there are many dimension;Step 6: the analysis based on step 4 and step 5 is as a result, obtain the travel industry Analysis on development result in region to be analyzed;Realizing being capable of the comprehensive and accurate technical effect analyzed regional tourism industry development.
Description
Technical field
The present invention relates to computer digital animations and analysis technical field, and in particular, to a kind of regional tourism industry hair
The analysis method of exhibition.
Background technique
In recent years, domestic tourist keeps burning hot development, and national tourism consumption is in great demand.Universe tourism has become
National strategy is the following tourism development general orientation.Universe tourism is related to each relevant departments in region and grabs together to build together, and all residents are total
With participation, each element is made full use of, realizes tourist's overall process, the Tourist Experience of full-time sky.Wherein, tourism+internet is to realize
One ring of key of universe tourism using data mining is government, scenic spot, enterprise and tourist provide service is that the following tourism is big
Trend.
Although tourism industry development is so rapid, public's tourism consumption enthusiasm is surging, and cultural tour supervision department is to whole
The development of a industry lacks complete data and more professional industry analysis, is confined to mostly to the analysis of this industry development
The small range of Analysis on development such as scenic spot, hotel, travel agency.
In conclusion present inventor has found above-mentioned technology extremely during realizing the present application technical solution
It has the following technical problems less:
In the prior art, the analysis method of existing travel industry development, which exists, analyzes comprehensive deficiency, and analysis is accurate
The poor technical problem of property.
Summary of the invention
The present invention provides a kind of analysis method of regional tourism industry development, realizing can be comprehensive and accurate to region
Travel industry develops the technical effect analyzed.
This method develops related data to travel industry and is grabbed, integrated and analyzed using a certain region as research object,
It is realized by the multi dimensional analysis to data based on the profound deep-seated problem for excavating heuristic data performance behind to travel
Based on industry development monitoring, to service supplemented by tourist, providing for the grasp regional tourism industry development of cultural tour supervision department can
The reference frame leaned on, and data supporting service is provided for the formulation of relevant policies.
For achieving the above object, this application provides a kind of analysis methods of regional tourism industry development, including such as
Lower step:
Step 1: the region to be analyzed of tourism industry health degree is determined;
Step 2: data acquisition and pretreatment;
1, corresponding region tourism industry data are crawled from the website OTA, and is stored;
Data crawl object and are way ox, same to journey, take the websites OTA such as journey, hornet's nest, donkey mother, skill dragon, public comment, press
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, crawling data field is commodity or service list, commodity or clothes
Business details, user comment.
The data that will be crawled, store classifiedly in the form of text in local, complete when crawling task every time, then by local file
It is pushed on specified hdfs server, and retains backup.
2, the data of storage are pre-processed
(1) missing data is handled
Obtain data to be divided into three classes: commodity or service list, commodity service details and user comment, with quotient between three
Product or the ID of service, if commodity or service can not be associated with businessman, filter such data as association.
Data most worthy field is comment content, if commenting on field contents as sky, filters the comment data.
(2) dealing of abnormal data
Commenting on field contents is " system default favorable comment ", " this user does not fill in evaluation content " etc., then deletes the comment number
According to.
(3) data normalization
Data source is in multiple OTA, and different platform data grabber standard is inconsistent, needs to data normalization.
Step 3: model training
1, training set and test set are constructed
(1) mark emotion tends to
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification random selection passes through at step 2
The sample comment data managed carries out emotion to every comment content and tends to index, and difference, which is commented, is labeled as -1, and favorable comment is labeled as 1, and
Based on the sample comment data construction training set and test set after index.
(2) corpus is handled
Word segmentation processing is carried out using sample comment data of the participle tool to (1).
(3) term vector of sample comment data is constructed
Word2vec.Word2Vec method is called to realize the vector to each word, forms term vector.
2, training set term vector, Training Support Vector Machines model are based on
3, by suitably increasing difference comment in training set, optimize Training Support Vector Machines model.
Step 4: regional tourism industry health degree analysis
Regional tourism data are segmented, construct vector;By the model of the training in vector input step three, calculate
The Sentiment orientation commented on to every.
Step 5: regional tourism industry segments dimensional analysis
1, tourism industry epigraph library of all categories is constructed
(1) it according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, extracts and segments dimension under each classification
Emotion word, modified by emotion word vocabulary, modify negative word etc. of emotion word, construct tourism industry lower subdivision dimension of all categories
Epigraph library.
(2) epigraph arranges
The same dimension of dimension will be segmented under each classification in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications
Emotion word polymerize duplicate removal;
It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, passive: -1, form feelings
Feel dictionary.
Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provide dimension label, shape
At document.
It proposes the proprietary emotion word of each dimension, marks respective dimensions, form dimension and identify library.
The negative word extracted in (1) is polymerize duplicate removal, forms negative dictionary.
2, participle clause
Comment is segmented, word segmentation result and corresponding part of speech are provided, is made pauses in reading unpunctuated ancient writings according to part of speech to comment to word segmentation result, shape
The multiple clauses commented at single.
3, dimension is extracted to each clause
Comment evaluative dimension is extracted referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library
It converges, then directly can identify that clause includes dimension referring to dimension identification library, if nothing, dimensional analysis does not close then clause dimension here
In the dimension of note.
4, dimension Sentiment orientation is calculated
Each of which dimension Sentiment orientation value is defaulted for arbitrarily commenting on agreement.
5, Sentiment orientation is calculated
Based on clause, searches clause and segment whether vocabulary appears in emotion dictionary, if appearing in emotion dictionary, in conjunction with
Emotion dictionary may recognize that emotion word is inclined to, if not appearing in emotion dictionary, it is believed that clause's vocabulary is without this dimension point
Analyse the dimension emotion word of concern.
Difference set is asked to clause and emotion dictionary, is sought common ground to result set and negative dictionary, by judging element in intersection
Number, it is possible to determine that negative word frequency of occurrence combines emotion word tendency to provide final Sentiment orientation with this.
One or more technical solution provided by the present application, has at least the following technical effects or advantages:
By that comprehensive and accurate can analyze regional tourism industry development, the data being related to based on this method are analyzed
Method and research and development accumulation, similar can apply to the analysis and displaying of all parts of the country culture, travel industry development.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention;
Fig. 1 is that the process of the analysis method of regional tourism industry development in the application is intended to;
Fig. 2 is the composition schematic diagram of the analysis system of regional tourism industry development in the application.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real
Applying mode, the present invention is further described in detail.It should be noted that in the case where not conflicting mutually, the application's
Feature in embodiment and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also
Implemented with being different from the other modes being described herein in range using other, therefore, protection scope of the present invention is not by under
The limitation of specific embodiment disclosed in face.
Specific embodiment one:
Referring to FIG. 1, step 1: determining region to be analyzed, be in the present embodiment to grind with Chengdu region travel industry
Study carefully object;
Step 2: data acquisition and pretreatment;
1, corresponding region tourism industry data are crawled from the website OTA, and is stored;
Data crawl object and are way ox, same to journey, take the websites OTA such as journey, hornet's nest, donkey mother, skill dragon, public comment, press
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, crawling data field is commodity or service list, commodity or clothes
Business details, user comment.
The detailed process crawled are as follows:
Crawl the website OTA city list;
URL is constructed according to city list, crawls commodity or service list;
Commodity or service list details are crawled according to commodity or service list;
Commodity or service reviews are crawled according to commodity or service list.
The data that will be crawled, store classifiedly in the form of text in local, when the task that crawls is completed, then local file are pushed
Onto specified hdfs server, and retain backup.
2, the data of storage are pre-processed
(1) missing data is handled;
Obtain data to be divided into three classes: commodity or service list, commodity service details and user comment, with quotient between three
Product or the ID of service, if commodity or service can not be associated with businessman, filter such data as association.
Data most worthy field is comment content, if commenting on field contents as sky, filters the comment data.
(2) dealing of abnormal data;
Commenting on field contents is " system default favorable comment ", " this user does not fill in evaluation content " etc., then deletes the comment number
According to.
(3) data normalization;
Data source is in multiple OTA, and different platform data grabber standard is inconsistent, needs to data normalization.
Such as: each OTA food and drink merchant category is inconsistent, unified integration be Chinese-style restaurant, the simple meal that lies fallow, strange land flavor, chafing dish, oneself
Help meal, drink, cuisine variety street/night market, seafood and other 9 major class, standard as follows:
Chinese-style restaurant --- way ox: the same journey in Chinese-style restaurant: restaurant restaurant, Peasants Joy hornet's nest: Sichuan cuisine, state dish, Hunan cuisine, Zhejiang dish,
Yunnan cuisine, Guangdong dishes, Xinjiang dish public comment: Sichuan cuisine, Guangdong dishes, Beijing cuisine, North-east China cuisine, Xinjiang dish, confidential dish, farmers''s dish, the daily life of a family
Dish, rabbit head/rabbit fourth, grilled fish, vegetable dish go where: Zi Gong salt help dish, North-east China cuisine, Beijing cuisine, Xinjiang dish, Hunan cuisine, Sichuan cuisine, private house
Dish.
Leisure letter meal --- way ox: the same journey of snack: small to have fast food, oodle shop, bread dessert, teahouse hornet's nest: fast
Meal, take-away/window, snack, congee StoreFront eat public comment: small to have fast food, powder shop, bread dessert go where: it is small to have fast food, copy
Hand, large intestines powder, rice noodles, steamed stuffed bun, Tofu pudding, bread dessert.
Strange land flavor --- way ox: strange land flavor, the same journey of western-style restaurant: western-style restaurant arranges hornet's nest: Southeast Asia cuisines, west
Meal, Japanese cuisine, South Korea arrange public comment: western-style food, South Korea cooking, Japanese dish, Southeast Asia dish go where: western-style food, South Korea material
Reason, Japanese food.
Chafing dish --- way ox: the same journey of chafing dish: chafing dish hornet's nest: chafing dish public comment: chafing dish, string flavour passage go where: chafing dish,
Sichuan chafing dish, fish hot pot emit dish.
Buffet --- way ox: the same journey of cafeteria: buffet public comment: buffet go where: buffet.
Drink --- way ox: cafe, the same journey of sweets/beverage: drink, coffee-house, teahouse, bar hornet's nest: coffee drink
Product, bar, teahouse, afternoon tea public comment: coffee shop, teahouse, afternoon tea go where: coffee shop, teahouse.
Cuisine variety street/night market --- way ox: cuisine variety street/night market, the same journey of barbecue/roast meat shop: barbecue strip string, prepared food pot-stewed meat or fowl hornet
Nest: barbecue public comment: barbecue go where: cuisine variety street HOT, barbecue.
Seafood --- way ox: the same journey in the fresh shop in seafood/river: seafood hornet's nest: seafood public comment: which cray, seafood go
Youngster: seafood.
Other --- way ox: the local same journey of characteristic: other cuisines, commercial circle, local characteristic/specialty hornet's nest: chain/plus
Alliance, theme, dining room, pothouse, characteristic, trendy styles from Hong Kong public comment: other cuisines, health care of food, popularity dining room, intention dish, fruit
It is fresh go where: dry pot, Muslin cuisine.
Such as: each lodging businessman house type classification disunity, unified integration are single room, big bed room, twin room, family room, set
Room, more human world, parent-offspring room and characteristic room.
Single room --- fuzzy matching includes the house type of the printed words such as " single room " and " separate room ".
Big bed room --- fuzzy matching includes the house type of printed words such as " big beds ", " guest room ", " deluxe " and " fine work room ".
Twin room --- fuzzy matching includes the house type of printed words such as " double beds ", " double ", " standard room " and " between mark ".
Family room --- fuzzy matching includes the house type of printed words such as " families ".
Suite --- fuzzy matching includes the house type of printed words such as " suites ".
More human world --- fuzzy matching includes " four people ", " 4 people ", " 6 people ", " six people ", " 8 people ", " ten " and " more people " etc.
The house type of printed words.
Parent-offspring room --- fuzzy matching includes the house type of printed words such as " parent-offsprings ".
Characteristic room --- fuzzy matching includes " landscape ", " tatami ", " Northern Europe ", " modern times ", " special price ", " lovers ", " river
The words such as scape ", " starry sky ", " view ", " sunlight ", " garden ", " forest ", " wave ", " dream ", " video display ", " projection " and " see mountain "
The house type of sample.
Such as: each OTA the rank of the scenic spot normalized written is inconsistent, and crawler standard is inconsistent, existing unified integration is AAAAA grades,
AAAA grades, AAA grades, AA grades, A grades and without star six stats stage:
One the rank of the scenic spot data normalization of table
Data source needs duplicate removal in different platform, partial data.
Certain industry businessman's quantity is counted, same businessman issues Business Information in different platform, and statistics businessman's quantity needs are gone
Weight, then only needs to polymerize for comment data.
Step 3: model training
1, training set and test set are constructed
(1) manually mark emotion tends to
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification random selection passes through at step 2
The sample comment data managed carries out emotion to every comment content and tends to index, poor assessment of bids note using the method for manual labeling
It is -1, favorable comment is labeled as 1, and based on the sample comment data construction training set and test set after index.
It is 10000 that training set, which records number, and test set is recorded as 5000.
When manual labeling, emotion word is commented to occurring difference in comment, just concludes that comment is commented for difference, is otherwise favorable comment;Comment on nothing
Obvious Sentiment orientation then defaults favorable comment.
Example:
- 1, to be frank, the original intention in shop is good, however type is very little, and taste is general, and deal is less, and price is partially expensive, one
Hammer dealing.Good advice is not always pleasing to the ear, wishes to storekeeper and makes great efforts to improve!
- 1, it is too poor to service, and a goose intestines are struck down to the dust, and waiter has directly grabbed and just thrown away rubbish bucket, without any compensation
Measure is repaid, is gone never again;
1 ,~good stick~the perfection of streaky pork can also be carried out to quick-fried~next time again by being fond of eating;
1, beautiful environment, and thing is pretty good, and emphasis is that shopkeeper wife is more beautiful;
(2) corpus is handled
Realize that comment participle carries out word segmentation processing to the sample comment data of (1) using Jieba participle tool.
(3) term vector of sample comment data is constructed
Word2vec.Word2Vec method is called to realize the vector to each word, forms term vector.
Emotion trend training set T={ (x is obtained after treatment1,y1),(x2,y2),...,(xN,yN), yi∈{-1,+
1 }, i=1,2 ..., N;Wherein, x indicates that sample, y indicate the emotion trend of the sample, and 0 represents passive ,+1 representative actively, N generation
The quantity of table training data.
2, it is based on training set term vector, Training Support Vector Machines model:
F (x)=sign (w*·x+b*);Wherein, w*And b*Calculating process are as follows: set w*And b*To meet yi(wT·xi+b)
>=1 optimal solution and w*·x+b*=0;w*For weight vector, b*For biasing, T is emotion trend training set.
Wherein training dataset T={ (x1,y1),(x2,y2),...,(xN,yN), yi∈ { -1 ,+1 }, i=1,2 ...,
N。
Wherein w*And b*Calculating process are as follows:
If w*And b*To meet yi(wT·xi+ b) >=1 most there are solution and w*·x+b*=0.
Test set is subjected to 5 folding cross validations, as a result example k-fold=5:[0.964 0.953 0.965 0.986
0.982], the accuracy rate of mean of k-fold=5:0.97, classification are higher, it was demonstrated that the model can use the classification of this field.
3, by suitably increasing difference comment in training set, optimize Training Support Vector Machines model.
Reference model interior prediction and outer predicted conditions, it is appropriate to increase training set sample size, make to predict more accurate.
Searching misclassification situation and being concentrated mainly on difference scoring class is favorable comment, can be very by adding difference comment in training set
Big degree improves classification accuracy, therefore make difference comment on accounting increase by adding training set, optimizes training pattern with this.
Step 4: regional tourism industry health degree analysis
Regional tourism data are segmented, construct vector;By the model of the training in vector input step three, calculate
The Sentiment orientation commented on to every.
Paging is carried out using review record unique identification (commentid) sequence (ascending order) and reads data, is saved every time
This value is read data lower limit (not including) by secondary reading data unique designation maximum value, reads data for each
Carry out following below scheme analysis.
Following steps are based on example comment and carry out the displaying of corresponding steps example, original comment are as follows:
To be frank, the original intention in shop is good, however type to commentid=1514907434899311616747074
Very little, taste is general, and deal is less, and price is partially expensive, "once-for-all" deal.Good advice is not always pleasing to the ear, wishes to storekeeper and makes great efforts to improve!
Commentid=1507795037609384448903723 is cheap, and parking position is more, northeast taste, no
It is wrong.
Original comment is segmented:
Commentid=1514907434899311616747074 [[saying, 0,1], [tangible, 1,3], [, 3,4], [,
4,5], [shop, 5,6], [, 6,7], [original intention, 7,9], [being 9,10], [good, 10,11], [, 11,12], [, 12,13],
[however, 13,15], [type, 15,17], [very little, 17,19], [, 19,20], [taste, 20,22], [general, 22,24],
[, 24,25], [deal, 25,27], [less, 27,29], [, 29,30], [price, 30,32], [partially expensive, 32,34], [,
34,35], [hammer, 35,37], [hammer, 36,38], [dealing, 38,40], ["once-for-all" deal, 35,40], [.,40,41],
[sincere advice, 41,43], [unpleasant to the ear, 43,45], [good advice is not always pleasing to the ear, 41,45], [, 45,46], [but 46,47], [still, 47,49],
[it is desirable that 49,51], [storekeeper, 51,53], [effort, 53,55], [improving, 55,57], [!,57,58]]
Commentid=1514907434899311616747074 [[price, 0,2], [cheap, 2,4], [it is cheap,
0,4], [, 4,5], [parking, 5,7], [parking lot, 6,8], [parking lot, 5,8], [parking stall, 8,10], [more, 10,11], [, 11,
12], [northeast, 12,14], [taste, 14,16], [, 16,17], [good, 17,19], [.,19,20]]
It comments on word segmentation result and searches term vector, calculate sentence vector, the model of the training in input step three is calculated
The Sentiment orientation of every comment.
The prediction result of training pattern in aforementioned comment input step three are as follows:
Commentid='1514907434899311616747074', resourcename=' elder sister baby characteristic snack ',
Classfy=' eats ', hcp=-1.0 } -1.0 (indicating that difference is commented)
{ commentid='1507795037609384448903723', resourcename=' good fortune elder sister food still coarse food grain
(north all the way Wanda shop) ', classfy=' eats ', hcp=1.0 } 1.0 (indicating favorable comments)
Wherein commentid indicates review record unique identification, and resourcename indicates the corresponding businessman of review record,
Classfy indicates the corresponding classification (one kind in food trip purchase joy) of the review record, and hcp is indicated should after model calculates
The Sentiment orientation of review record, -1.0 expression differences are commented, and 1.0 indicate favorable comment.
Specific embodiment two:
For the analysis method accuracy for further increasing regional tourism industry development, a kind of pair of region is present embodiments provided
Tourism industry is finely divided dimension point according to food and drink, lodging, traffic, tourism, 6 classifications of shopping and amusement (food trip purchase joy)
The method of analysis.
1, tourism industry epigraph library of all categories is constructed
(1) it according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, extracts and segments dimension under each classification
Emotion word, modified by emotion word vocabulary, modify negative word etc. of emotion word, construct tourism industry lower subdivision dimension of all categories
Epigraph library.
Specifically, the subordinate of " food trip joy purchase " analyze dimension can according to the form below 2 conclude:
Table 2
We are by taking taste dimension in food and drink classification as an example:
(1) emotion word
Well, very well, good, it is fond of eating, rubbish is poor, and firmly, expensive, difference, sweet tea is too many, good, it is few, it is bitterly salty, one
As, bad, material benefit ...;
(2) modificand --- noun, the gerund of emotion word modification;
Barbecue, fish head, pineapple, noodles are cooked-on, chicken feet, dish, taste, taste, and octopus sushi is folded arms, breakfast, laughable, mandarin duck
Mandarin duck pot, plain green pepper face, refreshment ...
(3) negative word --- the adverbial word modified emotion word and other have the vocabulary of acting in opposition to Sentiment orientation.
No, or not it is not, may not, do not have, it is difficult to, it is seldom, few, not, lack, e.g., cannot, not enough, can not, it is difficult,
Not, no, dare not, not enough, not so, have no ...
(2) epigraph arranges
The same dimension of dimension will be segmented under each classification in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications
Emotion word polymerize duplicate removal;
It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, passive: -1, form feelings
Feel dictionary, is stored in sentiment_dict.
Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provide dimension label, shape
At document.
It proposes the proprietary emotion word of each dimension, marks respective dimensions, merged with afore-mentioned document, form dimension and identify library, deposit
dims_dict。
The negative word extracted in food and drink, lodging, traffic, tourism, shopping and entertainment review is polymerize duplicate removal, forms negative word
Library is stored in negative_word.
2, participle clause
Comment is segmented, word segmentation result and corresponding part of speech are provided, is made pauses in reading unpunctuated ancient writings according to part of speech to comment to word segmentation result, shape
The multiple clauses commented at single.
The tool that participle clause uses is made pauses in reading unpunctuated ancient writings for Ansj segmenter based on adjective.
Subordinate sentence reason:
1. comment may relate to multiple dimensions, multiple emotion words, subordinate sentence can not match emotion word and dimension word accurately;
2. being more easier after subordinate sentence to the positioning of negative word in subordinate sentence;
3. " environmental services are pretty good " or " environment, service are pretty good " this kind of multiple public emotions of dimension can be accurately identified
The situation (the advantages of compared to clause is divided with symbol) of word.
It is illustrated below with two sentences:
Former review record are as follows:
Commented=1507795023668517632174915 is delicious, and dish amount is very big, and service is general.
Commented=1507795023675111936040195 is not good in the past, and dish amount is considerably less.Reception is not in time;
It is segmented using Ansj segmenter: (supplement part of speech table)
[taste/n, very/d, good/a ,/w, dish amount/nw, very/d, big/a ,/w, service/vn, general/a,./w]
[do not have/d, in the past/f, good/a ,/w, dish amount/nw, very/d, few/a,./ w, reception/v, not /d, in time/ad]
Subordinate sentence is carried out using Ansj segmenter:
[/ taste/very/good, // dish amount/very/big, // service/general]
Not [/ do not have/former/good, // dish amount/very/few, // reception/or not in time]
3, dimension is extracted to each clause
Comment evaluative dimension is extracted referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library
It converges, then directly can identify that clause includes dimension referring to dimension identification library, if nothing, dimensional analysis does not close then clause dimension here
In the dimension of note.
4, Sentiment orientation is calculated
Based on clause, searches clause and segment whether vocabulary appears in emotion dictionary, if appearing in emotion dictionary, in conjunction with
Emotion dictionary may recognize that emotion word is inclined to, if not appearing in emotion dictionary, it is believed that clause's vocabulary is without this dimension point
Analyse the dimension emotion word of concern.
Difference set is asked to clause and emotion dictionary, is sought common ground to result set and negative dictionary, by judging element in intersection
Number, it is possible to determine that negative word frequency of occurrence combines emotion word tendency to provide final Sentiment orientation with this.
Each of which dimension Sentiment orientation value 0 is defaulted for arbitrarily commenting on agreement.
As long as certain dimension Negative Affect word occurs, this dimension Sentiment orientation is judged for passiveness, i.e. dimension difference is commented.
Aforementioned example sentence carries out the result after dimension affection computation are as follows:
Id='1507795023676931072650665', taste (taste)=1, weight (component)=1,
Dishes (sabot)=0, price (price)=0, service (service)=1, hygiene (health)=(position 0, locates
Set)=0, room (room)=0, safe (safety)=0, facilities (facility)=0, admin (management)=0, traffic
(traffic)=0, tourist (flow of the people/volume of the flow of passengers)=0, scenic (landscape)=0, diet (vegetable)=0, commodity
(commodity)=0, classfy=' eats '
Id='1507795023675111936040195', taste (taste)=0, weight (component)=- 1,
Dishes (sabot)=0, price (price)=0, service (service)=- 1, hygiene (health)=(position 0, locates
Set)=0, room (room)=0, safe (safety)=0, facilities (safety)=0, admin (management)=0, traffic
(traffic)=0, tourist (flow of the people/volume of the flow of passengers)=0, scenic (landscape)=0, diet (vegetable)=0, commodity
(commodity)=0, classfy=' eats '
The Sentiment orientation value of corresponding dimension is 0, then it represents that the review record is not related to the dimension;The emotion of corresponding dimension is inclined
It is 1 to value, then it represents that the review record dimension is evaluated as favorable comment;The Sentiment orientation value of corresponding dimension is -1, then it represents that should
The difference that is evaluated as of the review record dimension is commented;The value of classfy corresponds to a classification in food trip purchase joy.
Specific embodiment three:
It is carried out further in specific embodiment one, specific embodiment two as a result, can be developed with a domain travel industry
It analyzes and is visualized, please refer to Fig. 2.
1, food and drink classification
(1) industry industry situation constitutes situation analysis
Industry industry situation constitutes situation analysis
Analysis: each classification industry situation businessman quantity, accounting, each grade industry situation businessman quantity and accounting;
Data source: Meituan, public comment, Baidu's tourism, hornet's nest, go where, take journey, with journey and way ox cuisines, food and drink
Basic data;
It realizes: counting food and drink businessman quantity by classification and grade respectively, calculate the total businessman's quantitative proportion of respective numbers Zhan;
Note: 1. classify: Chinese-style restaurant, the simple meal that lies fallow, fast food, snack, western-style restaurant, Japan's dish, South Korea's dish, other;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing
Star;
Analysis: each star Peasants Joy quantity, accounting;
Data source: Chengdu common data open platform-star Peasants Joy situation summary information;
It realizes: counting Peasants Joy quantity by star respectively, calculate respective numbers and divine by astrology a grade Peasants Joy quantitative proportion;
Analysis: each star grade hotel's quantity, accounting;
Data source: -2018 annual star grade hotel register of Chengdu tourism E-gov Network-trade management-hotel;
It realizes: counting restaurant's quantity by star respectively, calculate respective numbers and account for star grade hotel's quantitative proportion;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis
Businessman's quantity
Analysis: food and drink businessman's quantity
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink basic data;
Realize: food and drink businessman's quantity sums (pre-processing the different data of same businessman's different platform trade name);
Price range analysis etc.;
Analysis: businessman's pre-capita consumption is located at each price range segment number;
Data source: Meituan, public comment, Baidu tourism, go where and with journey cuisines basic data;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to pre-capita consumption;
Note: price range: 20 yuan or less, 20-40 member, 40-60 member, 60-80 member, 80-100 member, 100-120 member, 120 yuan
More than;
Popular vegetable analysis;
Analysis: comment is more, the part vegetable for recommending number big;
Data source: public comment cuisines basic data and comment data;
It realizes: referring to comment number and number being recommended to model hot topic degree, provide popular degree to vegetable and sort;
Popular businessman's analysis;
Analysis: more businessmans is commented on;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis;
Analysis: the excellent businessmans of dimensional comparisons such as delicious flavour, component are sufficient, sabot is beautiful;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to taste, component
Good job scoring analysis is carried out with dimensions such as sabots, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient
Family's recommendation;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as taste, component and sabots --- the scoring of general health degree;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to health, taste
The dimensions such as road, component and sabot carry out good job scoring analysis, calculate each dimension positive rating, are modeled, obtained according to each dimension positive rating
The scoring of general health degree;
Each industry situation health degree
Analysis: the corresponding comment of classification is by dimensionality analysis results such as taste, component and sabots --- classification health degree scoring;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
It realizes: referring to general health degree method, calculating each classification health degree scoring;
Note: 1. classify: Chinese-style restaurant, the simple meal that lies fallow, fast food, snack, western-style restaurant, Japan's dish, South Korea's dish, other;
Evaluative dimension analysis
Analysis: the dimensions such as taste, component and sabot evaluate favorable comment accounting;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to taste, component
Good job scoring analysis is carried out with dimensions such as sabots, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring
Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
Realize: similar favorable comment enterprise ranks realization process
Comment word warmly
Analysis: good job comments hot word
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei
Food, food and drink comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
2, lodging classification
(1) industry industry situation is constituted and growth pattern is analyzed
Industry industry situation constitutes and growth pattern analysis
Analysis: each star businessman quantity, accounting, each grade businessman quantity, accounting, each star, grade businessman quantity year by year
Change with accounting;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
It realizes: counting lodging businessman's quantity year by year by star and grade respectively, calculate the total businessman's quantity ratio of respective numbers Zhan
Example;
Analysis: each star quantity in rural hotel, accounting;
Data source: Chengdu common data open platform-star rural area hotel situation summary information;
It realizes: counting rural hotel quantity by star respectively;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing
Star;
(2) industrial economy operating analysis
Businessman's quantitative analysis
Analysis: lodging businessman's quantity
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: lodging businessman's quantity sums (pre-processing the different data of same businessman's different platform trade name);
Room type enriches degree analyzing
Analysis: businessman's quantity of each house type is provided;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: statistics, which calculates, provides businessman's quantity of each house type;
Note: house type: big bed room, twin room, suite etc.;
The analysis of infrastructure service facility;
Analysis: each infrastructure service facility businessman's quantity is provided;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: statistics, which calculates, provides businessman's quantity of various infrastructure service facilities;
Note: infrastructure service facility: Wifi, wake-up service, Left baggage, there are elevator, electronics checkout system, 24 hours hot water
Deng;
Price range analysis
Analysis: each segment businessman quantity of lodging price starting price;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to starting price;
Note: 1. starting price: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300
Member, 350-400 member, 450-500 member, 500 yuan or more;
Popular house type analysis
Analysis: more house type is commented on;
Data source: journey, donkey mother, is taken, with journey hotel comment data at skill dragon in Jingdone district;
It realizes: calculating each house type in the comment number of all platforms;
Popular hotel's analysis
Analysis: more hotel is commented on;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: the preferable hotel of the performances such as position, facility, service and health, inn etc.;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to position, facility,
The dimensions such as service and health carry out good job scoring analysis, calculate each dimension positive rating of each businessman, are modeled, obtained according to each dimension positive rating
Businessman's recommendation out;
(3) industry health degree analysis
General health degree
Analysis: the dimensionality analysis result such as on-Line review opsition dependent, facility, service and health --- general health degree scoring;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification writes code and realizes to position, sets
Apply, service and the dimensions such as health carry out good job scoring analysis, calculate each dimension positive rating, according to each dimension positive rating model, obtain
The scoring of general health degree;
Each industry situation health degree
Analysis: star, grade accordingly comment on the dimensionality analysis result such as opsition dependent, facility, service and health --- each industry situation
Health degree scoring;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: referring to general health degree method, calculating each star and grade corresponds to health degree;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing
Star;
Evaluative dimension analysis
Analysis: the dimensions such as position, facility, service and health evaluate favorable comment accounting;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to position, facility,
The dimensions such as service and health carry out good job scoring analysis, calculate each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring
Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
3, traffic classification
(1) road enriches degree analyzing
Road enriches degree analyzing
Analysis: different types of mileages of transport route number;
Data source: Chengdu common data open platform-highway mileage number information;
It realizes: inquiring the mileage number of national highway, provincial highway, county road, township road, accommodation road and village road;
(2) regular bus is analyzed
Public transport
Analysis: public bus network item number
Data source: Chengdu common data open platform-public bus network information;
It realizes: counting the number of lines by line name;
Analysis: possess the bus station that most public bus networks reach website;
Data source: Chengdu common data open platform-public bus network information;
It realizes: reaching the public bus network of website by station statistics;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as price, service, health and facilities --- the scoring of general health degree;
Data source: public comment traffic comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes
The dimensions such as business, health and facility carry out good job scoring analysis, calculate each dimension positive rating, are modeled, obtained according to each dimension positive rating
The scoring of general health degree;
Evaluative dimension analysis
Analysis: the dimensions such as landscape, management, traffic, price, service and facility evaluate favorable comment accounting;
Data source: public comment traffic comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to price, service,
The dimensions such as health and facility carry out good job scoring analysis, calculate each dimension positive rating;
Comment word warmly
Analysis: good job comments hot word
Data source: public comment traffic comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
4, tourism classification
(1) industry industry situation is constituted and growth pattern is analyzed
The analysis of Chengdu travel agency growth pattern;;
Analysis: each district travel agency accounting, travel agency registers number year by year
Data source: Chengdu tourism E-gov Network-trade management-travel agency-Chengdu travel agency register;
It realizes: counting travel agency's quantity by district, calculate accounting, count travel agency's quantity by the registration time;
Other provinces and towns analyze in Rong travel agency growth pattern;
Analysis: each district travel agency accounting, travel agency registers number year by year;
Data source: E-gov Network-trade management-travel agency-other provinces and towns are traveled in Rong's branch register in Chengdu;
It realizes: counting travel agency's quantity by district, calculate accounting, count travel agency's quantity by the registration time;
The analysis of service network quantity statistics;
Analysis: each travel agency counts in Chengdu service network, each district Travel Agency Service net number;
Data source: Chengdu tourism E-gov Network-trade management-travel agency-Chengdu Travel Agency Service site register;
It realizes: pressing travel agency's statistical fractals dot number, count Travel Agency Service's dot number by district;
Industry industry situation constitutes situation analysis;
Analysis: star quantity, accounting
Data source: Baidu's tourism, donkey mother, go where, take journey, with journey and way ox admission ticket;
It realizes: counting scenic spot quantity by star, calculate respective numbers and account for total quantity ratio;
Note: 1. star: AAAAA, AAAA, AAA, AA, A and without star;
(2) industrial economy operating analysis
Scenic spot quantitative analysis
Analysis: tourist attraction quantity
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played basic data;
It realizes: the summation of scenic spot quantity;
Travelling route analysis
Railway superstructures
Analysis: the most route top 10 of number of going on a tour;
Data source: donkey mother, hornet's nest, go where, take journey, the trip of way ox periphery, locality are played basic data;
It realizes: counting tourist's number by route;
The analysis of route supply amount;
Analysis: statistics number of, lines;
Data source: donkey mother, hornet's nest, go where, take journey, the trip of way ox periphery, locality are played basic data;
It realizes: calculating the number of lines for arriving destination;
Route price analysis (price range, trend analysis, Analysis in Growth);
Analysis: travelling route price range section includes quantity;
Data source: basic data that donkey mother, way ox periphery swim, locality is played;
It realizes: statistics summation is carried out to route number amount by the price range of division to starting price;
Note: 1. starting price: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300
Member, 350-400 member, 450-500 member, 500 yuan or more;
Low price path monitoring (early warning of low price route, low price line feed quotient), the analysis of route departure place);
Analysis: low price route top 10, low price route respective vendor, low price route departure place;
Data source: basic data that donkey mother, way ox periphery swim, locality is played;
It realizes: searching the minimum Some routes of starting price;
Popular scenic spot analysis
Analysis: more scenic spot is commented on;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
It realizes: calculating the comment number at each scenic spot;
Recommendation analysis
Analysis: preferable scenic spot of the performances such as landscape, management, traffic, price, service and facility etc.;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to landscape, management,
The dimensions such as traffic and price carry out good job scoring analysis, calculate each dimension positive rating of each businessman, are modeled, obtained according to each dimension positive rating
Businessman's recommendation out;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis result --- general health degree such as landscape, management, traffic, price, service and facilities
Scoring;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to landscape, pipe
The dimensions such as reason, traffic, price, service and facility carry out good job scoring analysis, each dimension positive rating are calculated, according to each dimension positive rating
Modeling show that general health degree scores;
Each industry situation health degree
Analysis: star is accordingly commented on by dimensionality analysis results such as landscape, management, traffic, price, service and facilities --- and it is each
The scoring of industry situation health degree;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
It realizes: referring to general health degree method, calculating each star and correspond to health degree;
Note: 1. star: AAAAA, AAAA, AAA, AA, A;
Evaluative dimension analysis;
Analysis: the dimensions such as landscape, management, traffic, price, service and facility evaluate favorable comment accounting;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to landscape, management,
The dimensions such as traffic, price, service and facility carry out good job scoring analysis, calculate each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring
Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
Realize: similar favorable comment enterprise ranks realization process
Comment word warmly
Analysis: good job comments hot word
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape
Point, locality are played comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence;
5, shopping category
(1) industry industry situation is constituted and growth pattern is analyzed;
Industry industry situation constitutes and growth pattern analysis;
Analysis: all types of businessman's quantity, accounting;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: counting businessman's quantity by type, calculate respective numbers and account for total quantity ratio;
Note: 1. type: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis;
Businessman's quantitative analysis
Analysis: shopping businessman's quantity;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: shopping businessman's quantity summation;
Popular businessman's analysis;
Analysis: more businessman is commented on;
Data source: public comment, donkey mother, hornet's nest, comment data of doing shopping with journey and way ox;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: at fair price, outstanding, the preferable shopping place of environment of service;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service
Good job scoring analysis is carried out with dimensions such as commodity, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient
Family's recommendation;
(3) industry health degree analysis
General health degree;
Analysis: on-Line review is by dimensionality analysis results such as price, service and commodity --- the scoring of general health degree;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes
The dimensions such as business and commodity carry out good job scoring analysis, calculate each dimension positive rating, are modeled according to each dimension positive rating, obtain overall strong
Kang Du scoring;
Each industry situation health degree (by type)
Analysis: type is accordingly commented on by dimensionality analysis results such as price, service and commodity --- each industry situation health degree scoring;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: referring to general health degree method, calculating all types of corresponding health degrees;
Evaluative dimension analysis;
Analysis: the dimensions such as price, service and commodity evaluate favorable comment accounting;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service
Good job scoring analysis is carried out with dimensions such as commodity, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring
Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
6, classification is entertained
(1) industry industry situation is constituted and growth pattern is analyzed;
Industry industry situation constitutes situation analysis;
Analysis: all types of businessman's quantity, accounting;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, basic data of playing
It realizes: counting businessman's quantity by type, calculate respective numbers and account for total quantity ratio;
Note: 1. type: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis;
Businessman's quantitative analysis
Analysis: amusement businessman's quantity
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, basic data of playing
It realizes: amusement businessman's quantity summation;
Price analysis (price range, trend analysis, Analysis in Growth);
Analysis: pre-capita consumption is located at each price range segment number;
Data source: Meituan, public comment, donkey mother, with journey and way ox amusement, she entertains point, basic data of playing;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to starting price;
Note: price range: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300
Member, 300-350 member, 350-400 member and 400 yuan or more;
Popular businessman's analysis
Analysis: more businessman is commented on;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, amusement point, play and comment on number
According to;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: at fair price, outstanding, the preferable public place of entertainment of environment of service;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service
Good job scoring analysis is carried out with dimensions such as environment, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient
Family's recommendation;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as price, service and environment --- the scoring of general health degree;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes
The dimensions such as business and environment carry out good job scoring analysis, calculate each dimension positive rating, are modeled according to each dimension positive rating, obtain overall strong
Kang Du scoring;
Each industry situation health degree (by type)
Analysis: type is accordingly commented on by dimensionality analysis results such as price, service and environment --- each industry situation health degree scoring;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, amusement point, play and comment on number
According to;
It realizes: referring to general health degree method, calculating all types of corresponding health degrees;
Evaluative dimension analysis;
Analysis: the dimensions such as price, service and environment evaluate favorable comment accounting;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service
Good job scoring analysis is carried out with dimensions such as environment, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring
Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters;
Analysis: ten Ge Chaping enterprises;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (10)
1. a kind of analysis method of regional tourism industry development, which is characterized in that the described method includes:
Step 1: region to be analyzed is determined;
Step 2: it acquires the relevant data acquisition in region to be analyzed and is pre-processed;
Step 3: building Training Support Vector Machines model;
Step 4: it is based on pretreated regional tourism data to be analyzed and Training Support Vector Machines model, treats analyzed area
Tourism industry health degree analyzed;
Step 5: the tourism industry for being analysed to region is analyzed there are many dimension;
Step 6: the analysis based on step 4 and step 5 is as a result, obtain the travel industry Analysis on development result in region to be analyzed.
2. the analysis method of regional tourism industry development according to claim 1, which is characterized in that the step 4 is specific
It include: to be segmented and constructed Vector Processing to pretreated regional tourism data to be analyzed;The vector of building is inputted into instruction
Practice supporting vector machine model, the Sentiment orientation of every comment is calculated.
3. the analysis method of regional tourism industry development according to claim 1, which is characterized in that the step 2 is specific
Are as follows: the tourism industry data that corresponding region to be analyzed is crawled from the website OTA, carrying out pretreatment to data includes: at missing data
Reason, dealing of abnormal data, data normalization processing.
4. the analysis method of regional tourism industry development according to claim 1, which is characterized in that building training support to
Amount machine model includes: building training set and test set;Based on training set term vector, Training Support Vector Machines model;In training set
Increase difference comment, optimizes Training Support Vector Machines model.
5. the analysis method of regional tourism industry development according to claim 4, which is characterized in that building training set and survey
Examination collects
(1) mark emotion tends to:
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification randomly chooses pretreated sample and comments
By data, emotion is carried out to every comment content and tends to index, difference, which is commented, is labeled as -1, and favorable comment is labeled as 1, and based on index after
Sample comment data constructs training set and test set;
(2) corpus is handled:
Word segmentation processing is carried out to the sample comment data after index using participle tool;
(3) term vector of sample comment data is constructed.
6. the analysis method of regional tourism industry development according to claim 1, which is characterized in that be analysed to region
Tourism industry is analyzed there are many dimension, is specifically included:
The epigraph library of a variety of dimensions under building tourism industry is of all categories;Epigraph is arranged;Comment is segmented, is provided point
Word result and corresponding part of speech make pauses in reading unpunctuated ancient writings to word segmentation result to comment according to part of speech, form multiple clauses of single comment;According to
Various dimensions ladder library calculates and determines that each clause is related to classification and segments the Sentiment orientation of dimension and respective dimensions.
7. the analysis method of regional tourism industry development according to claim 6, which is characterized in that building tourism industry is each
The epigraph library of a variety of dimensions under classification, specifically includes: according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, mentioning
Take the emotion word that dimension is segmented under each classification, the vocabulary modified by emotion word, the negative word for modifying emotion word, building tourism
The epigraph library of industry lower subdivision dimension of all categories;
Epigraph is arranged, is specifically included: will be same one-dimensional in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications
It spends emotion word and polymerize duplicate removal;It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, disappear
Pole: -1, form emotion dictionary;Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provided
Dimension label, forms document;It proposes the proprietary emotion word of each dimension, marks respective dimensions, form dimension and identify library;By the no of extraction
Determine word polymerization duplicate removal, forms negative dictionary.
8. the analysis method of regional tourism industry development according to claim 6, which is characterized in that extracted to each clause
Dimension includes: to extract comment evaluative dimension referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library
Converge, then can refer to dimension identification library identify clause include dimension, if nothing, then clause dimension not here dimensional analysis concern
In dimension;
The Sentiment orientation for calculating comment includes: to be searched clause based on clause and segmented whether vocabulary appears in emotion dictionary, if going out
In present emotion dictionary, it may recognize that emotion word is inclined in conjunction with emotion dictionary, if not appearing in emotion dictionary, then it is assumed that clause
The dimension emotion word for not having this dimensional analysis to pay close attention in vocabulary;Difference set is asked to obtain result set clause and emotion dictionary, to knot
Fruit collection seeks common ground with negative dictionary, by judging element number in intersection, determines negative word frequency of occurrence, is occurred based on negative word
Number and emotion word tendency obtain the Sentiment orientation finally commented on.
9. the analysis method of regional tourism industry development according to claim 4, which is characterized in that emotion trend training set
Are as follows:
T={ (x1,y1),(x2,y2),...,(xN,yN), yi∈ { -1 ,+1 }, i=1,2 ..., N;Wherein, x indicates sample, y table
Show the emotion trend of the sample, 0 represents passive ,+1 representative actively, and N represents the quantity of training data.
10. the analysis method of regional tourism industry development according to claim 4, which is characterized in that training supporting vector
Machine model are as follows:
F (x)=sign (w*·x+b*);Wherein, w*And b*Calculating process are as follows: set w*And b*To meet yi(wT·xi+ b) >=1 most
Excellent solution and w*·x+b*=0;w*For weight vector, b*For biasing, T is emotion trend training set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910123321.3A CN109858973A (en) | 2019-02-18 | 2019-02-18 | A kind of analysis method of regional tourism industry development |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910123321.3A CN109858973A (en) | 2019-02-18 | 2019-02-18 | A kind of analysis method of regional tourism industry development |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109858973A true CN109858973A (en) | 2019-06-07 |
Family
ID=66898226
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910123321.3A Pending CN109858973A (en) | 2019-02-18 | 2019-02-18 | A kind of analysis method of regional tourism industry development |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109858973A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598134A (en) * | 2019-09-23 | 2019-12-20 | 钟栎娜 | Big data based intelligent tourist destination data report generation method |
CN112418681A (en) * | 2020-11-26 | 2021-02-26 | 北京上奇数字科技有限公司 | Method and apparatus for analyzing industrial development, electronic device, and storage medium |
CN112990632A (en) * | 2019-12-18 | 2021-06-18 | 北京智识企业管理咨询有限公司 | Regional industry competitiveness analysis system and method based on big data |
CN113961699A (en) * | 2021-09-26 | 2022-01-21 | 北京清华同衡规划设计研究院有限公司 | Tourism resource investigation method and system |
CN116737922A (en) * | 2023-03-10 | 2023-09-12 | 云南大学 | Tourist online comment fine granularity emotion analysis method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609132A (en) * | 2017-09-18 | 2018-01-19 | 杭州电子科技大学 | One kind is based on Ontology storehouse Chinese text sentiment analysis method |
CN108269024A (en) * | 2018-01-31 | 2018-07-10 | 钟栎娜 | A kind of tourist famous-city evaluation method based on big data |
CN109034893A (en) * | 2018-07-20 | 2018-12-18 | 成都中科大旗软件有限公司 | A kind of tourist net comment sentiment analysis and QoS evaluating method |
CN109213861A (en) * | 2018-08-01 | 2019-01-15 | 上海电力学院 | In conjunction with the tourism evaluation sensibility classification method of At_GRU neural network and sentiment dictionary |
-
2019
- 2019-02-18 CN CN201910123321.3A patent/CN109858973A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609132A (en) * | 2017-09-18 | 2018-01-19 | 杭州电子科技大学 | One kind is based on Ontology storehouse Chinese text sentiment analysis method |
CN108269024A (en) * | 2018-01-31 | 2018-07-10 | 钟栎娜 | A kind of tourist famous-city evaluation method based on big data |
CN109034893A (en) * | 2018-07-20 | 2018-12-18 | 成都中科大旗软件有限公司 | A kind of tourist net comment sentiment analysis and QoS evaluating method |
CN109213861A (en) * | 2018-08-01 | 2019-01-15 | 上海电力学院 | In conjunction with the tourism evaluation sensibility classification method of At_GRU neural network and sentiment dictionary |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598134A (en) * | 2019-09-23 | 2019-12-20 | 钟栎娜 | Big data based intelligent tourist destination data report generation method |
CN112990632A (en) * | 2019-12-18 | 2021-06-18 | 北京智识企业管理咨询有限公司 | Regional industry competitiveness analysis system and method based on big data |
CN112990632B (en) * | 2019-12-18 | 2024-01-09 | 北京智识企业管理咨询有限公司 | Regional industry competitiveness analysis system and method based on big data |
CN112418681A (en) * | 2020-11-26 | 2021-02-26 | 北京上奇数字科技有限公司 | Method and apparatus for analyzing industrial development, electronic device, and storage medium |
CN113961699A (en) * | 2021-09-26 | 2022-01-21 | 北京清华同衡规划设计研究院有限公司 | Tourism resource investigation method and system |
CN116737922A (en) * | 2023-03-10 | 2023-09-12 | 云南大学 | Tourist online comment fine granularity emotion analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109858973A (en) | A kind of analysis method of regional tourism industry development | |
Yousaf et al. | Halal culinary and tourism marketing strategies on government websites: A preliminary analysis | |
Juvan et al. | Biting off more than they can chew: Food waste at hotel breakfast buffets | |
Cook et al. | The world on a plate: culinary culture, displacement and geographical knowledges | |
Fleischhacker et al. | A systematic review of fast food access studies | |
Gupta et al. | Preferential analysis of street food amongst the foreign tourists: A case of Delhi region | |
Tarulevicz | Eating her curries and kway: A cultural history of food in Singapore | |
Lu et al. | How port aesthetics affect destination image, tourist satisfaction and tourist loyalty? | |
Gupta et al. | Street foods: contemporary preference of tourists and its role as a destination attraction in India | |
Cankül et al. | Travel agencies and gastronomy tourism: case of IATA member a-class travel agencies | |
Wang et al. | Consumer culture in traditional food market: the influence of Chinese consumers to the cultural construction of Chinese barbecue | |
Deng et al. | Exploring the relationships of experiential value, destination image and destination loyalty: A case of Macau Food Festival | |
Kan et al. | Promoting traditional local cuisines for tourists: evidence from Taiwan | |
Kowalczyk et al. | Street food and food trucks: Old and new trends in urban gastronomy | |
Matejowsky | Fast Food Globalization in the Provincial Philippines | |
Alali et al. | Genre-based analysis of travel guides: A study on Malaysia, Thailand and the Philippines | |
Hashimoto et al. | Ekiben, the travelling Japanese lunchbox: Promoting regional development and local identity through food tourism | |
Cong et al. | An indicator measuring the influence of the online public food environment: an analytical framework and case study | |
Pamantung et al. | Revitalization of Minahasan Culture Through Vocabulary of Traditional Food Names in the Context of Developing Culinary Tourism in North Sulawesi Province | |
Tran et al. | Country of origin, Price consciousness, and consumer innovativeness at food service outlets in developing markets: Empirical Evidence from Brands of Imported Beef in Vietnam | |
Huang et al. | Developing Australia’s food and wine tourism towards the Chinese visitor market | |
Dahiya et al. | Exploring the food tourism effectiveness of official websites of BRICS nations | |
Nakpathom et al. | Exploring the expectation of youth purchasing intention for street food as gastronomy tourism in Bangsaen, Thailand | |
Djonda et al. | Linguistic Analysis of Trademarks of Selected Buffet Restaurants in SM Mall of Asia, Manila | |
Gaman et al. | Tourist Image Of Romania Reviewed By International Travel Guides. Comparative Study: English, French And German Editions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190607 |