CN109858973A - A kind of analysis method of regional tourism industry development - Google Patents

A kind of analysis method of regional tourism industry development Download PDF

Info

Publication number
CN109858973A
CN109858973A CN201910123321.3A CN201910123321A CN109858973A CN 109858973 A CN109858973 A CN 109858973A CN 201910123321 A CN201910123321 A CN 201910123321A CN 109858973 A CN109858973 A CN 109858973A
Authority
CN
China
Prior art keywords
comment
dimension
analysis
data
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910123321.3A
Other languages
Chinese (zh)
Inventor
周道华
古鹏飞
李柏椿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Flag Softcom Ltd Of Chengdu Chinese University Of Science And Technology
Original Assignee
Flag Softcom Ltd Of Chengdu Chinese University Of Science And Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Flag Softcom Ltd Of Chengdu Chinese University Of Science And Technology filed Critical Flag Softcom Ltd Of Chengdu Chinese University Of Science And Technology
Priority to CN201910123321.3A priority Critical patent/CN109858973A/en
Publication of CN109858973A publication Critical patent/CN109858973A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of analysis methods of regional tourism industry development, which comprises step 1: determining region to be analyzed;Step 2: it acquires the relevant data acquisition in region to be analyzed and is pre-processed;Step 3: building Training Support Vector Machines model;Step 4: being based on pretreated regional tourism data to be analyzed and Training Support Vector Machines model, and the tourism industry health degree for treating analyzed area is analyzed;Step 5: the tourism industry for being analysed to region is analyzed there are many dimension;Step 6: the analysis based on step 4 and step 5 is as a result, obtain the travel industry Analysis on development result in region to be analyzed;Realizing being capable of the comprehensive and accurate technical effect analyzed regional tourism industry development.

Description

A kind of analysis method of regional tourism industry development
Technical field
The present invention relates to computer digital animations and analysis technical field, and in particular, to a kind of regional tourism industry hair The analysis method of exhibition.
Background technique
In recent years, domestic tourist keeps burning hot development, and national tourism consumption is in great demand.Universe tourism has become National strategy is the following tourism development general orientation.Universe tourism is related to each relevant departments in region and grabs together to build together, and all residents are total With participation, each element is made full use of, realizes tourist's overall process, the Tourist Experience of full-time sky.Wherein, tourism+internet is to realize One ring of key of universe tourism using data mining is government, scenic spot, enterprise and tourist provide service is that the following tourism is big Trend.
Although tourism industry development is so rapid, public's tourism consumption enthusiasm is surging, and cultural tour supervision department is to whole The development of a industry lacks complete data and more professional industry analysis, is confined to mostly to the analysis of this industry development The small range of Analysis on development such as scenic spot, hotel, travel agency.
In conclusion present inventor has found above-mentioned technology extremely during realizing the present application technical solution It has the following technical problems less:
In the prior art, the analysis method of existing travel industry development, which exists, analyzes comprehensive deficiency, and analysis is accurate The poor technical problem of property.
Summary of the invention
The present invention provides a kind of analysis method of regional tourism industry development, realizing can be comprehensive and accurate to region Travel industry develops the technical effect analyzed.
This method develops related data to travel industry and is grabbed, integrated and analyzed using a certain region as research object, It is realized by the multi dimensional analysis to data based on the profound deep-seated problem for excavating heuristic data performance behind to travel Based on industry development monitoring, to service supplemented by tourist, providing for the grasp regional tourism industry development of cultural tour supervision department can The reference frame leaned on, and data supporting service is provided for the formulation of relevant policies.
For achieving the above object, this application provides a kind of analysis methods of regional tourism industry development, including such as Lower step:
Step 1: the region to be analyzed of tourism industry health degree is determined;
Step 2: data acquisition and pretreatment;
1, corresponding region tourism industry data are crawled from the website OTA, and is stored;
Data crawl object and are way ox, same to journey, take the websites OTA such as journey, hornet's nest, donkey mother, skill dragon, public comment, press According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, crawling data field is commodity or service list, commodity or clothes Business details, user comment.
The data that will be crawled, store classifiedly in the form of text in local, complete when crawling task every time, then by local file It is pushed on specified hdfs server, and retains backup.
2, the data of storage are pre-processed
(1) missing data is handled
Obtain data to be divided into three classes: commodity or service list, commodity service details and user comment, with quotient between three Product or the ID of service, if commodity or service can not be associated with businessman, filter such data as association.
Data most worthy field is comment content, if commenting on field contents as sky, filters the comment data.
(2) dealing of abnormal data
Commenting on field contents is " system default favorable comment ", " this user does not fill in evaluation content " etc., then deletes the comment number According to.
(3) data normalization
Data source is in multiple OTA, and different platform data grabber standard is inconsistent, needs to data normalization.
Step 3: model training
1, training set and test set are constructed
(1) mark emotion tends to
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification random selection passes through at step 2 The sample comment data managed carries out emotion to every comment content and tends to index, and difference, which is commented, is labeled as -1, and favorable comment is labeled as 1, and Based on the sample comment data construction training set and test set after index.
(2) corpus is handled
Word segmentation processing is carried out using sample comment data of the participle tool to (1).
(3) term vector of sample comment data is constructed
Word2vec.Word2Vec method is called to realize the vector to each word, forms term vector.
2, training set term vector, Training Support Vector Machines model are based on
3, by suitably increasing difference comment in training set, optimize Training Support Vector Machines model.
Step 4: regional tourism industry health degree analysis
Regional tourism data are segmented, construct vector;By the model of the training in vector input step three, calculate The Sentiment orientation commented on to every.
Step 5: regional tourism industry segments dimensional analysis
1, tourism industry epigraph library of all categories is constructed
(1) it according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, extracts and segments dimension under each classification Emotion word, modified by emotion word vocabulary, modify negative word etc. of emotion word, construct tourism industry lower subdivision dimension of all categories Epigraph library.
(2) epigraph arranges
The same dimension of dimension will be segmented under each classification in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications Emotion word polymerize duplicate removal;
It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, passive: -1, form feelings Feel dictionary.
Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provide dimension label, shape At document.
It proposes the proprietary emotion word of each dimension, marks respective dimensions, form dimension and identify library.
The negative word extracted in (1) is polymerize duplicate removal, forms negative dictionary.
2, participle clause
Comment is segmented, word segmentation result and corresponding part of speech are provided, is made pauses in reading unpunctuated ancient writings according to part of speech to comment to word segmentation result, shape The multiple clauses commented at single.
3, dimension is extracted to each clause
Comment evaluative dimension is extracted referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library It converges, then directly can identify that clause includes dimension referring to dimension identification library, if nothing, dimensional analysis does not close then clause dimension here In the dimension of note.
4, dimension Sentiment orientation is calculated
Each of which dimension Sentiment orientation value is defaulted for arbitrarily commenting on agreement.
5, Sentiment orientation is calculated
Based on clause, searches clause and segment whether vocabulary appears in emotion dictionary, if appearing in emotion dictionary, in conjunction with Emotion dictionary may recognize that emotion word is inclined to, if not appearing in emotion dictionary, it is believed that clause's vocabulary is without this dimension point Analyse the dimension emotion word of concern.
Difference set is asked to clause and emotion dictionary, is sought common ground to result set and negative dictionary, by judging element in intersection Number, it is possible to determine that negative word frequency of occurrence combines emotion word tendency to provide final Sentiment orientation with this.
One or more technical solution provided by the present application, has at least the following technical effects or advantages:
By that comprehensive and accurate can analyze regional tourism industry development, the data being related to based on this method are analyzed Method and research and development accumulation, similar can apply to the analysis and displaying of all parts of the country culture, travel industry development.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention;
Fig. 1 is that the process of the analysis method of regional tourism industry development in the application is intended to;
Fig. 2 is the composition schematic diagram of the analysis system of regional tourism industry development in the application.
Specific embodiment
To better understand the objects, features and advantages of the present invention, with reference to the accompanying drawing and specific real Applying mode, the present invention is further described in detail.It should be noted that in the case where not conflicting mutually, the application's Feature in embodiment and embodiment can be combined with each other.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, still, the present invention may be used also Implemented with being different from the other modes being described herein in range using other, therefore, protection scope of the present invention is not by under The limitation of specific embodiment disclosed in face.
Specific embodiment one:
Referring to FIG. 1, step 1: determining region to be analyzed, be in the present embodiment to grind with Chengdu region travel industry Study carefully object;
Step 2: data acquisition and pretreatment;
1, corresponding region tourism industry data are crawled from the website OTA, and is stored;
Data crawl object and are way ox, same to journey, take the websites OTA such as journey, hornet's nest, donkey mother, skill dragon, public comment, press According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, crawling data field is commodity or service list, commodity or clothes Business details, user comment.
The detailed process crawled are as follows:
Crawl the website OTA city list;
URL is constructed according to city list, crawls commodity or service list;
Commodity or service list details are crawled according to commodity or service list;
Commodity or service reviews are crawled according to commodity or service list.
The data that will be crawled, store classifiedly in the form of text in local, when the task that crawls is completed, then local file are pushed Onto specified hdfs server, and retain backup.
2, the data of storage are pre-processed
(1) missing data is handled;
Obtain data to be divided into three classes: commodity or service list, commodity service details and user comment, with quotient between three Product or the ID of service, if commodity or service can not be associated with businessman, filter such data as association.
Data most worthy field is comment content, if commenting on field contents as sky, filters the comment data.
(2) dealing of abnormal data;
Commenting on field contents is " system default favorable comment ", " this user does not fill in evaluation content " etc., then deletes the comment number According to.
(3) data normalization;
Data source is in multiple OTA, and different platform data grabber standard is inconsistent, needs to data normalization.
Such as: each OTA food and drink merchant category is inconsistent, unified integration be Chinese-style restaurant, the simple meal that lies fallow, strange land flavor, chafing dish, oneself Help meal, drink, cuisine variety street/night market, seafood and other 9 major class, standard as follows:
Chinese-style restaurant --- way ox: the same journey in Chinese-style restaurant: restaurant restaurant, Peasants Joy hornet's nest: Sichuan cuisine, state dish, Hunan cuisine, Zhejiang dish, Yunnan cuisine, Guangdong dishes, Xinjiang dish public comment: Sichuan cuisine, Guangdong dishes, Beijing cuisine, North-east China cuisine, Xinjiang dish, confidential dish, farmers''s dish, the daily life of a family Dish, rabbit head/rabbit fourth, grilled fish, vegetable dish go where: Zi Gong salt help dish, North-east China cuisine, Beijing cuisine, Xinjiang dish, Hunan cuisine, Sichuan cuisine, private house Dish.
Leisure letter meal --- way ox: the same journey of snack: small to have fast food, oodle shop, bread dessert, teahouse hornet's nest: fast Meal, take-away/window, snack, congee StoreFront eat public comment: small to have fast food, powder shop, bread dessert go where: it is small to have fast food, copy Hand, large intestines powder, rice noodles, steamed stuffed bun, Tofu pudding, bread dessert.
Strange land flavor --- way ox: strange land flavor, the same journey of western-style restaurant: western-style restaurant arranges hornet's nest: Southeast Asia cuisines, west Meal, Japanese cuisine, South Korea arrange public comment: western-style food, South Korea cooking, Japanese dish, Southeast Asia dish go where: western-style food, South Korea material Reason, Japanese food.
Chafing dish --- way ox: the same journey of chafing dish: chafing dish hornet's nest: chafing dish public comment: chafing dish, string flavour passage go where: chafing dish, Sichuan chafing dish, fish hot pot emit dish.
Buffet --- way ox: the same journey of cafeteria: buffet public comment: buffet go where: buffet.
Drink --- way ox: cafe, the same journey of sweets/beverage: drink, coffee-house, teahouse, bar hornet's nest: coffee drink Product, bar, teahouse, afternoon tea public comment: coffee shop, teahouse, afternoon tea go where: coffee shop, teahouse.
Cuisine variety street/night market --- way ox: cuisine variety street/night market, the same journey of barbecue/roast meat shop: barbecue strip string, prepared food pot-stewed meat or fowl hornet Nest: barbecue public comment: barbecue go where: cuisine variety street HOT, barbecue.
Seafood --- way ox: the same journey in the fresh shop in seafood/river: seafood hornet's nest: seafood public comment: which cray, seafood go Youngster: seafood.
Other --- way ox: the local same journey of characteristic: other cuisines, commercial circle, local characteristic/specialty hornet's nest: chain/plus Alliance, theme, dining room, pothouse, characteristic, trendy styles from Hong Kong public comment: other cuisines, health care of food, popularity dining room, intention dish, fruit It is fresh go where: dry pot, Muslin cuisine.
Such as: each lodging businessman house type classification disunity, unified integration are single room, big bed room, twin room, family room, set Room, more human world, parent-offspring room and characteristic room.
Single room --- fuzzy matching includes the house type of the printed words such as " single room " and " separate room ".
Big bed room --- fuzzy matching includes the house type of printed words such as " big beds ", " guest room ", " deluxe " and " fine work room ".
Twin room --- fuzzy matching includes the house type of printed words such as " double beds ", " double ", " standard room " and " between mark ".
Family room --- fuzzy matching includes the house type of printed words such as " families ".
Suite --- fuzzy matching includes the house type of printed words such as " suites ".
More human world --- fuzzy matching includes " four people ", " 4 people ", " 6 people ", " six people ", " 8 people ", " ten " and " more people " etc. The house type of printed words.
Parent-offspring room --- fuzzy matching includes the house type of printed words such as " parent-offsprings ".
Characteristic room --- fuzzy matching includes " landscape ", " tatami ", " Northern Europe ", " modern times ", " special price ", " lovers ", " river The words such as scape ", " starry sky ", " view ", " sunlight ", " garden ", " forest ", " wave ", " dream ", " video display ", " projection " and " see mountain " The house type of sample.
Such as: each OTA the rank of the scenic spot normalized written is inconsistent, and crawler standard is inconsistent, existing unified integration is AAAAA grades, AAAA grades, AAA grades, AA grades, A grades and without star six stats stage:
One the rank of the scenic spot data normalization of table
Data source needs duplicate removal in different platform, partial data.
Certain industry businessman's quantity is counted, same businessman issues Business Information in different platform, and statistics businessman's quantity needs are gone Weight, then only needs to polymerize for comment data.
Step 3: model training
1, training set and test set are constructed
(1) manually mark emotion tends to
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification random selection passes through at step 2 The sample comment data managed carries out emotion to every comment content and tends to index, poor assessment of bids note using the method for manual labeling It is -1, favorable comment is labeled as 1, and based on the sample comment data construction training set and test set after index.
It is 10000 that training set, which records number, and test set is recorded as 5000.
When manual labeling, emotion word is commented to occurring difference in comment, just concludes that comment is commented for difference, is otherwise favorable comment;Comment on nothing Obvious Sentiment orientation then defaults favorable comment.
Example:
- 1, to be frank, the original intention in shop is good, however type is very little, and taste is general, and deal is less, and price is partially expensive, one Hammer dealing.Good advice is not always pleasing to the ear, wishes to storekeeper and makes great efforts to improve!
- 1, it is too poor to service, and a goose intestines are struck down to the dust, and waiter has directly grabbed and just thrown away rubbish bucket, without any compensation Measure is repaid, is gone never again;
1 ,~good stick~the perfection of streaky pork can also be carried out to quick-fried~next time again by being fond of eating;
1, beautiful environment, and thing is pretty good, and emphasis is that shopkeeper wife is more beautiful;
(2) corpus is handled
Realize that comment participle carries out word segmentation processing to the sample comment data of (1) using Jieba participle tool.
(3) term vector of sample comment data is constructed
Word2vec.Word2Vec method is called to realize the vector to each word, forms term vector.
Emotion trend training set T={ (x is obtained after treatment1,y1),(x2,y2),...,(xN,yN), yi∈{-1,+ 1 }, i=1,2 ..., N;Wherein, x indicates that sample, y indicate the emotion trend of the sample, and 0 represents passive ,+1 representative actively, N generation The quantity of table training data.
2, it is based on training set term vector, Training Support Vector Machines model:
F (x)=sign (w*·x+b*);Wherein, w*And b*Calculating process are as follows: set w*And b*To meet yi(wT·xi+b) >=1 optimal solution and w*·x+b*=0;w*For weight vector, b*For biasing, T is emotion trend training set.
Wherein training dataset T={ (x1,y1),(x2,y2),...,(xN,yN), yi∈ { -1 ,+1 }, i=1,2 ..., N。
Wherein w*And b*Calculating process are as follows:
If w*And b*To meet yi(wT·xi+ b) >=1 most there are solution and w*·x+b*=0.
Test set is subjected to 5 folding cross validations, as a result example k-fold=5:[0.964 0.953 0.965 0.986 0.982], the accuracy rate of mean of k-fold=5:0.97, classification are higher, it was demonstrated that the model can use the classification of this field.
3, by suitably increasing difference comment in training set, optimize Training Support Vector Machines model.
Reference model interior prediction and outer predicted conditions, it is appropriate to increase training set sample size, make to predict more accurate.
Searching misclassification situation and being concentrated mainly on difference scoring class is favorable comment, can be very by adding difference comment in training set Big degree improves classification accuracy, therefore make difference comment on accounting increase by adding training set, optimizes training pattern with this.
Step 4: regional tourism industry health degree analysis
Regional tourism data are segmented, construct vector;By the model of the training in vector input step three, calculate The Sentiment orientation commented on to every.
Paging is carried out using review record unique identification (commentid) sequence (ascending order) and reads data, is saved every time This value is read data lower limit (not including) by secondary reading data unique designation maximum value, reads data for each Carry out following below scheme analysis.
Following steps are based on example comment and carry out the displaying of corresponding steps example, original comment are as follows:
To be frank, the original intention in shop is good, however type to commentid=1514907434899311616747074 Very little, taste is general, and deal is less, and price is partially expensive, "once-for-all" deal.Good advice is not always pleasing to the ear, wishes to storekeeper and makes great efforts to improve!
Commentid=1507795037609384448903723 is cheap, and parking position is more, northeast taste, no It is wrong.
Original comment is segmented:
Commentid=1514907434899311616747074 [[saying, 0,1], [tangible, 1,3], [, 3,4], [, 4,5], [shop, 5,6], [, 6,7], [original intention, 7,9], [being 9,10], [good, 10,11], [, 11,12], [, 12,13], [however, 13,15], [type, 15,17], [very little, 17,19], [, 19,20], [taste, 20,22], [general, 22,24], [, 24,25], [deal, 25,27], [less, 27,29], [, 29,30], [price, 30,32], [partially expensive, 32,34], [, 34,35], [hammer, 35,37], [hammer, 36,38], [dealing, 38,40], ["once-for-all" deal, 35,40], [.,40,41], [sincere advice, 41,43], [unpleasant to the ear, 43,45], [good advice is not always pleasing to the ear, 41,45], [, 45,46], [but 46,47], [still, 47,49], [it is desirable that 49,51], [storekeeper, 51,53], [effort, 53,55], [improving, 55,57], [!,57,58]]
Commentid=1514907434899311616747074 [[price, 0,2], [cheap, 2,4], [it is cheap, 0,4], [, 4,5], [parking, 5,7], [parking lot, 6,8], [parking lot, 5,8], [parking stall, 8,10], [more, 10,11], [, 11, 12], [northeast, 12,14], [taste, 14,16], [, 16,17], [good, 17,19], [.,19,20]]
It comments on word segmentation result and searches term vector, calculate sentence vector, the model of the training in input step three is calculated The Sentiment orientation of every comment.
The prediction result of training pattern in aforementioned comment input step three are as follows:
Commentid='1514907434899311616747074', resourcename=' elder sister baby characteristic snack ', Classfy=' eats ', hcp=-1.0 } -1.0 (indicating that difference is commented)
{ commentid='1507795037609384448903723', resourcename=' good fortune elder sister food still coarse food grain (north all the way Wanda shop) ', classfy=' eats ', hcp=1.0 } 1.0 (indicating favorable comments)
Wherein commentid indicates review record unique identification, and resourcename indicates the corresponding businessman of review record, Classfy indicates the corresponding classification (one kind in food trip purchase joy) of the review record, and hcp is indicated should after model calculates The Sentiment orientation of review record, -1.0 expression differences are commented, and 1.0 indicate favorable comment.
Specific embodiment two:
For the analysis method accuracy for further increasing regional tourism industry development, a kind of pair of region is present embodiments provided Tourism industry is finely divided dimension point according to food and drink, lodging, traffic, tourism, 6 classifications of shopping and amusement (food trip purchase joy) The method of analysis.
1, tourism industry epigraph library of all categories is constructed
(1) it according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, extracts and segments dimension under each classification Emotion word, modified by emotion word vocabulary, modify negative word etc. of emotion word, construct tourism industry lower subdivision dimension of all categories Epigraph library.
Specifically, the subordinate of " food trip joy purchase " analyze dimension can according to the form below 2 conclude:
Table 2
We are by taking taste dimension in food and drink classification as an example:
(1) emotion word
Well, very well, good, it is fond of eating, rubbish is poor, and firmly, expensive, difference, sweet tea is too many, good, it is few, it is bitterly salty, one As, bad, material benefit ...;
(2) modificand --- noun, the gerund of emotion word modification;
Barbecue, fish head, pineapple, noodles are cooked-on, chicken feet, dish, taste, taste, and octopus sushi is folded arms, breakfast, laughable, mandarin duck Mandarin duck pot, plain green pepper face, refreshment ...
(3) negative word --- the adverbial word modified emotion word and other have the vocabulary of acting in opposition to Sentiment orientation.
No, or not it is not, may not, do not have, it is difficult to, it is seldom, few, not, lack, e.g., cannot, not enough, can not, it is difficult, Not, no, dare not, not enough, not so, have no ...
(2) epigraph arranges
The same dimension of dimension will be segmented under each classification in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications Emotion word polymerize duplicate removal;
It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, passive: -1, form feelings Feel dictionary, is stored in sentiment_dict.
Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provide dimension label, shape At document.
It proposes the proprietary emotion word of each dimension, marks respective dimensions, merged with afore-mentioned document, form dimension and identify library, deposit dims_dict。
The negative word extracted in food and drink, lodging, traffic, tourism, shopping and entertainment review is polymerize duplicate removal, forms negative word Library is stored in negative_word.
2, participle clause
Comment is segmented, word segmentation result and corresponding part of speech are provided, is made pauses in reading unpunctuated ancient writings according to part of speech to comment to word segmentation result, shape The multiple clauses commented at single.
The tool that participle clause uses is made pauses in reading unpunctuated ancient writings for Ansj segmenter based on adjective.
Subordinate sentence reason:
1. comment may relate to multiple dimensions, multiple emotion words, subordinate sentence can not match emotion word and dimension word accurately;
2. being more easier after subordinate sentence to the positioning of negative word in subordinate sentence;
3. " environmental services are pretty good " or " environment, service are pretty good " this kind of multiple public emotions of dimension can be accurately identified The situation (the advantages of compared to clause is divided with symbol) of word.
It is illustrated below with two sentences:
Former review record are as follows:
Commented=1507795023668517632174915 is delicious, and dish amount is very big, and service is general.
Commented=1507795023675111936040195 is not good in the past, and dish amount is considerably less.Reception is not in time;
It is segmented using Ansj segmenter: (supplement part of speech table)
[taste/n, very/d, good/a ,/w, dish amount/nw, very/d, big/a ,/w, service/vn, general/a,./w]
[do not have/d, in the past/f, good/a ,/w, dish amount/nw, very/d, few/a,./ w, reception/v, not /d, in time/ad]
Subordinate sentence is carried out using Ansj segmenter:
[/ taste/very/good, // dish amount/very/big, // service/general]
Not [/ do not have/former/good, // dish amount/very/few, // reception/or not in time]
3, dimension is extracted to each clause
Comment evaluative dimension is extracted referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library It converges, then directly can identify that clause includes dimension referring to dimension identification library, if nothing, dimensional analysis does not close then clause dimension here In the dimension of note.
4, Sentiment orientation is calculated
Based on clause, searches clause and segment whether vocabulary appears in emotion dictionary, if appearing in emotion dictionary, in conjunction with Emotion dictionary may recognize that emotion word is inclined to, if not appearing in emotion dictionary, it is believed that clause's vocabulary is without this dimension point Analyse the dimension emotion word of concern.
Difference set is asked to clause and emotion dictionary, is sought common ground to result set and negative dictionary, by judging element in intersection Number, it is possible to determine that negative word frequency of occurrence combines emotion word tendency to provide final Sentiment orientation with this.
Each of which dimension Sentiment orientation value 0 is defaulted for arbitrarily commenting on agreement.
As long as certain dimension Negative Affect word occurs, this dimension Sentiment orientation is judged for passiveness, i.e. dimension difference is commented.
Aforementioned example sentence carries out the result after dimension affection computation are as follows:
Id='1507795023676931072650665', taste (taste)=1, weight (component)=1, Dishes (sabot)=0, price (price)=0, service (service)=1, hygiene (health)=(position 0, locates Set)=0, room (room)=0, safe (safety)=0, facilities (facility)=0, admin (management)=0, traffic (traffic)=0, tourist (flow of the people/volume of the flow of passengers)=0, scenic (landscape)=0, diet (vegetable)=0, commodity (commodity)=0, classfy=' eats '
Id='1507795023675111936040195', taste (taste)=0, weight (component)=- 1, Dishes (sabot)=0, price (price)=0, service (service)=- 1, hygiene (health)=(position 0, locates Set)=0, room (room)=0, safe (safety)=0, facilities (safety)=0, admin (management)=0, traffic (traffic)=0, tourist (flow of the people/volume of the flow of passengers)=0, scenic (landscape)=0, diet (vegetable)=0, commodity (commodity)=0, classfy=' eats '
The Sentiment orientation value of corresponding dimension is 0, then it represents that the review record is not related to the dimension;The emotion of corresponding dimension is inclined It is 1 to value, then it represents that the review record dimension is evaluated as favorable comment;The Sentiment orientation value of corresponding dimension is -1, then it represents that should The difference that is evaluated as of the review record dimension is commented;The value of classfy corresponds to a classification in food trip purchase joy.
Specific embodiment three:
It is carried out further in specific embodiment one, specific embodiment two as a result, can be developed with a domain travel industry It analyzes and is visualized, please refer to Fig. 2.
1, food and drink classification
(1) industry industry situation constitutes situation analysis
Industry industry situation constitutes situation analysis
Analysis: each classification industry situation businessman quantity, accounting, each grade industry situation businessman quantity and accounting;
Data source: Meituan, public comment, Baidu's tourism, hornet's nest, go where, take journey, with journey and way ox cuisines, food and drink Basic data;
It realizes: counting food and drink businessman quantity by classification and grade respectively, calculate the total businessman's quantitative proportion of respective numbers Zhan;
Note: 1. classify: Chinese-style restaurant, the simple meal that lies fallow, fast food, snack, western-style restaurant, Japan's dish, South Korea's dish, other;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing Star;
Analysis: each star Peasants Joy quantity, accounting;
Data source: Chengdu common data open platform-star Peasants Joy situation summary information;
It realizes: counting Peasants Joy quantity by star respectively, calculate respective numbers and divine by astrology a grade Peasants Joy quantitative proportion;
Analysis: each star grade hotel's quantity, accounting;
Data source: -2018 annual star grade hotel register of Chengdu tourism E-gov Network-trade management-hotel;
It realizes: counting restaurant's quantity by star respectively, calculate respective numbers and account for star grade hotel's quantitative proportion;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis
Businessman's quantity
Analysis: food and drink businessman's quantity
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink basic data;
Realize: food and drink businessman's quantity sums (pre-processing the different data of same businessman's different platform trade name);
Price range analysis etc.;
Analysis: businessman's pre-capita consumption is located at each price range segment number;
Data source: Meituan, public comment, Baidu tourism, go where and with journey cuisines basic data;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to pre-capita consumption;
Note: price range: 20 yuan or less, 20-40 member, 40-60 member, 60-80 member, 80-100 member, 100-120 member, 120 yuan More than;
Popular vegetable analysis;
Analysis: comment is more, the part vegetable for recommending number big;
Data source: public comment cuisines basic data and comment data;
It realizes: referring to comment number and number being recommended to model hot topic degree, provide popular degree to vegetable and sort;
Popular businessman's analysis;
Analysis: more businessmans is commented on;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis;
Analysis: the excellent businessmans of dimensional comparisons such as delicious flavour, component are sufficient, sabot is beautiful;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to taste, component Good job scoring analysis is carried out with dimensions such as sabots, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient Family's recommendation;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as taste, component and sabots --- the scoring of general health degree;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to health, taste The dimensions such as road, component and sabot carry out good job scoring analysis, calculate each dimension positive rating, are modeled, obtained according to each dimension positive rating The scoring of general health degree;
Each industry situation health degree
Analysis: the corresponding comment of classification is by dimensionality analysis results such as taste, component and sabots --- classification health degree scoring;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
It realizes: referring to general health degree method, calculating each classification health degree scoring;
Note: 1. classify: Chinese-style restaurant, the simple meal that lies fallow, fast food, snack, western-style restaurant, Japan's dish, South Korea's dish, other;
Evaluative dimension analysis
Analysis: the dimensions such as taste, component and sabot evaluate favorable comment accounting;
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to taste, component Good job scoring analysis is carried out with dimensions such as sabots, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
Realize: similar favorable comment enterprise ranks realization process
Comment word warmly
Analysis: good job comments hot word
Data source: Meituan, public comment, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way Niu Mei Food, food and drink comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
2, lodging classification
(1) industry industry situation is constituted and growth pattern is analyzed
Industry industry situation constitutes and growth pattern analysis
Analysis: each star businessman quantity, accounting, each grade businessman quantity, accounting, each star, grade businessman quantity year by year Change with accounting;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
It realizes: counting lodging businessman's quantity year by year by star and grade respectively, calculate the total businessman's quantity ratio of respective numbers Zhan Example;
Analysis: each star quantity in rural hotel, accounting;
Data source: Chengdu common data open platform-star rural area hotel situation summary information;
It realizes: counting rural hotel quantity by star respectively;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing Star;
(2) industrial economy operating analysis
Businessman's quantitative analysis
Analysis: lodging businessman's quantity
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: lodging businessman's quantity sums (pre-processing the different data of same businessman's different platform trade name);
Room type enriches degree analyzing
Analysis: businessman's quantity of each house type is provided;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: statistics, which calculates, provides businessman's quantity of each house type;
Note: house type: big bed room, twin room, suite etc.;
The analysis of infrastructure service facility;
Analysis: each infrastructure service facility businessman's quantity is provided;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
Realize: statistics, which calculates, provides businessman's quantity of various infrastructure service facilities;
Note: infrastructure service facility: Wifi, wake-up service, Left baggage, there are elevator, electronics checkout system, 24 hours hot water Deng;
Price range analysis
Analysis: each segment businessman quantity of lodging price starting price;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu basic data;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to starting price;
Note: 1. starting price: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300 Member, 350-400 member, 450-500 member, 500 yuan or more;
Popular house type analysis
Analysis: more house type is commented on;
Data source: journey, donkey mother, is taken, with journey hotel comment data at skill dragon in Jingdone district;
It realizes: calculating each house type in the comment number of all platforms;
Popular hotel's analysis
Analysis: more hotel is commented on;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: the preferable hotel of the performances such as position, facility, service and health, inn etc.;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to position, facility, The dimensions such as service and health carry out good job scoring analysis, calculate each dimension positive rating of each businessman, are modeled, obtained according to each dimension positive rating Businessman's recommendation out;
(3) industry health degree analysis
General health degree
Analysis: the dimensionality analysis result such as on-Line review opsition dependent, facility, service and health --- general health degree scoring;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification writes code and realizes to position, sets Apply, service and the dimensions such as health carry out good job scoring analysis, calculate each dimension positive rating, according to each dimension positive rating model, obtain The scoring of general health degree;
Each industry situation health degree
Analysis: star, grade accordingly comment on the dimensionality analysis result such as opsition dependent, facility, service and health --- each industry situation Health degree scoring;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: referring to general health degree method, calculating each star and grade corresponds to health degree;
Note: 1. star: five-pointed star, four stars, Samsung, two stars, a star;
2. grade: five-pointed star, four stars half, four stars, Samsung half, Samsung, two stars half, two stars, a star half, a star, half star, nothing Star;
Evaluative dimension analysis
Analysis: the dimensions such as position, facility, service and health evaluate favorable comment accounting;
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to position, facility, The dimensions such as service and health carry out good job scoring analysis, calculate each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word
Data source: Jingdone district, skill dragon, donkey mother, hornet's nest, go where, take journey, with the hotel Cheng Hetuniu comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
3, traffic classification
(1) road enriches degree analyzing
Road enriches degree analyzing
Analysis: different types of mileages of transport route number;
Data source: Chengdu common data open platform-highway mileage number information;
It realizes: inquiring the mileage number of national highway, provincial highway, county road, township road, accommodation road and village road;
(2) regular bus is analyzed
Public transport
Analysis: public bus network item number
Data source: Chengdu common data open platform-public bus network information;
It realizes: counting the number of lines by line name;
Analysis: possess the bus station that most public bus networks reach website;
Data source: Chengdu common data open platform-public bus network information;
It realizes: reaching the public bus network of website by station statistics;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as price, service, health and facilities --- the scoring of general health degree;
Data source: public comment traffic comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes The dimensions such as business, health and facility carry out good job scoring analysis, calculate each dimension positive rating, are modeled, obtained according to each dimension positive rating The scoring of general health degree;
Evaluative dimension analysis
Analysis: the dimensions such as landscape, management, traffic, price, service and facility evaluate favorable comment accounting;
Data source: public comment traffic comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to price, service, The dimensions such as health and facility carry out good job scoring analysis, calculate each dimension positive rating;
Comment word warmly
Analysis: good job comments hot word
Data source: public comment traffic comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
4, tourism classification
(1) industry industry situation is constituted and growth pattern is analyzed
The analysis of Chengdu travel agency growth pattern;;
Analysis: each district travel agency accounting, travel agency registers number year by year
Data source: Chengdu tourism E-gov Network-trade management-travel agency-Chengdu travel agency register;
It realizes: counting travel agency's quantity by district, calculate accounting, count travel agency's quantity by the registration time;
Other provinces and towns analyze in Rong travel agency growth pattern;
Analysis: each district travel agency accounting, travel agency registers number year by year;
Data source: E-gov Network-trade management-travel agency-other provinces and towns are traveled in Rong's branch register in Chengdu;
It realizes: counting travel agency's quantity by district, calculate accounting, count travel agency's quantity by the registration time;
The analysis of service network quantity statistics;
Analysis: each travel agency counts in Chengdu service network, each district Travel Agency Service net number;
Data source: Chengdu tourism E-gov Network-trade management-travel agency-Chengdu Travel Agency Service site register;
It realizes: pressing travel agency's statistical fractals dot number, count Travel Agency Service's dot number by district;
Industry industry situation constitutes situation analysis;
Analysis: star quantity, accounting
Data source: Baidu's tourism, donkey mother, go where, take journey, with journey and way ox admission ticket;
It realizes: counting scenic spot quantity by star, calculate respective numbers and account for total quantity ratio;
Note: 1. star: AAAAA, AAAA, AAA, AA, A and without star;
(2) industrial economy operating analysis
Scenic spot quantitative analysis
Analysis: tourist attraction quantity
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played basic data;
It realizes: the summation of scenic spot quantity;
Travelling route analysis
Railway superstructures
Analysis: the most route top 10 of number of going on a tour;
Data source: donkey mother, hornet's nest, go where, take journey, the trip of way ox periphery, locality are played basic data;
It realizes: counting tourist's number by route;
The analysis of route supply amount;
Analysis: statistics number of, lines;
Data source: donkey mother, hornet's nest, go where, take journey, the trip of way ox periphery, locality are played basic data;
It realizes: calculating the number of lines for arriving destination;
Route price analysis (price range, trend analysis, Analysis in Growth);
Analysis: travelling route price range section includes quantity;
Data source: basic data that donkey mother, way ox periphery swim, locality is played;
It realizes: statistics summation is carried out to route number amount by the price range of division to starting price;
Note: 1. starting price: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300 Member, 350-400 member, 450-500 member, 500 yuan or more;
Low price path monitoring (early warning of low price route, low price line feed quotient), the analysis of route departure place);
Analysis: low price route top 10, low price route respective vendor, low price route departure place;
Data source: basic data that donkey mother, way ox periphery swim, locality is played;
It realizes: searching the minimum Some routes of starting price;
Popular scenic spot analysis
Analysis: more scenic spot is commented on;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
It realizes: calculating the comment number at each scenic spot;
Recommendation analysis
Analysis: preferable scenic spot of the performances such as landscape, management, traffic, price, service and facility etc.;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to landscape, management, The dimensions such as traffic and price carry out good job scoring analysis, calculate each dimension positive rating of each businessman, are modeled, obtained according to each dimension positive rating Businessman's recommendation out;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis result --- general health degree such as landscape, management, traffic, price, service and facilities Scoring;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to landscape, pipe The dimensions such as reason, traffic, price, service and facility carry out good job scoring analysis, each dimension positive rating are calculated, according to each dimension positive rating Modeling show that general health degree scores;
Each industry situation health degree
Analysis: star is accordingly commented on by dimensionality analysis results such as landscape, management, traffic, price, service and facilities --- and it is each The scoring of industry situation health degree;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
It realizes: referring to general health degree method, calculating each star and correspond to health degree;
Note: 1. star: AAAAA, AAAA, AAA, AA, A;
Evaluative dimension analysis;
Analysis: the dimensions such as landscape, management, traffic, price, service and facility evaluate favorable comment accounting;
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
Realize: by extract comment emotion word and emotion modification noun/pronoun, write code realize to landscape, management, The dimensions such as traffic, price, service and facility carry out good job scoring analysis, calculate each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
Realize: similar favorable comment enterprise ranks realization process
Comment word warmly
Analysis: good job comments hot word
Data source: Jingdone district travelling, Baidu's tourism, donkey mother, hornet's nest, go where, take journey, with journey and way ox admission ticket, scape Point, locality are played comment data;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence;
5, shopping category
(1) industry industry situation is constituted and growth pattern is analyzed;
Industry industry situation constitutes and growth pattern analysis;
Analysis: all types of businessman's quantity, accounting;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: counting businessman's quantity by type, calculate respective numbers and account for total quantity ratio;
Note: 1. type: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis;
Businessman's quantitative analysis
Analysis: shopping businessman's quantity;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: shopping businessman's quantity summation;
Popular businessman's analysis;
Analysis: more businessman is commented on;
Data source: public comment, donkey mother, hornet's nest, comment data of doing shopping with journey and way ox;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: at fair price, outstanding, the preferable shopping place of environment of service;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service Good job scoring analysis is carried out with dimensions such as commodity, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient Family's recommendation;
(3) industry health degree analysis
General health degree;
Analysis: on-Line review is by dimensionality analysis results such as price, service and commodity --- the scoring of general health degree;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes The dimensions such as business and commodity carry out good job scoring analysis, calculate each dimension positive rating, are modeled according to each dimension positive rating, obtain overall strong Kang Du scoring;
Each industry situation health degree (by type)
Analysis: type is accordingly commented on by dimensionality analysis results such as price, service and commodity --- each industry situation health degree scoring;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: referring to general health degree method, calculating all types of corresponding health degrees;
Evaluative dimension analysis;
Analysis: the dimensions such as price, service and commodity evaluate favorable comment accounting;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service Good job scoring analysis is carried out with dimensions such as commodity, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters
Analysis: ten Ge Chaping enterprises;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word;
Data source: public comment, donkey mother, hornet's nest, basic data of doing shopping with journey and way ox;
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
6, classification is entertained
(1) industry industry situation is constituted and growth pattern is analyzed;
Industry industry situation constitutes situation analysis;
Analysis: all types of businessman's quantity, accounting;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, basic data of playing
It realizes: counting businessman's quantity by type, calculate respective numbers and account for total quantity ratio;
Note: 1. type: five-pointed star, four stars, Samsung, two stars, a star;
(2) industrial economy operating analysis;
Businessman's quantitative analysis
Analysis: amusement businessman's quantity
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, basic data of playing
It realizes: amusement businessman's quantity summation;
Price analysis (price range, trend analysis, Analysis in Growth);
Analysis: pre-capita consumption is located at each price range segment number;
Data source: Meituan, public comment, donkey mother, with journey and way ox amusement, she entertains point, basic data of playing;
It realizes: statistics summation is carried out to businessman's quantity by the price range of division to starting price;
Note: price range: 50 yuan or less, 50-100 member, 100-150 member, 150-200 member, 200-250 member, 250-300 Member, 300-350 member, 350-400 member and 400 yuan or more;
Popular businessman's analysis
Analysis: more businessman is commented on;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, amusement point, play and comment on number According to;
It realizes: calculating each businessman in the comment number of all platforms;
Recommendation analysis
Analysis: at fair price, outstanding, the preferable public place of entertainment of environment of service;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service Good job scoring analysis is carried out with dimensions such as environment, each dimension positive rating of each businessman is calculated, is modeled according to each dimension positive rating, obtain quotient Family's recommendation;
(3) industry health degree analysis
General health degree
Analysis: on-Line review is by dimensionality analysis results such as price, service and environment --- the scoring of general health degree;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion word modification is write code and is realized to price, clothes The dimensions such as business and environment carry out good job scoring analysis, calculate each dimension positive rating, are modeled according to each dimension positive rating, obtain overall strong Kang Du scoring;
Each industry situation health degree (by type)
Analysis: type is accordingly commented on by dimensionality analysis results such as price, service and environment --- each industry situation health degree scoring;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, amusement point, play and comment on number According to;
It realizes: referring to general health degree method, calculating all types of corresponding health degrees;
Evaluative dimension analysis;
Analysis: the dimensions such as price, service and environment evaluate favorable comment accounting;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: noun/pronoun by extracting comment emotion word and emotion modification is write code and is realized to price, service Good job scoring analysis is carried out with dimensions such as environment, calculates each dimension positive rating;
Favorable comment enterprise seniority among brothers and sisters
Analysis: favorable comment enterprise ranks top 10;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
It realizes: the measurement to participle being realized by the method for natural language processing, combining classification method realizes good job scoring Class show that the good job of each comment is commented, and calculates enterprise and comments on number, positive rating, favorable comment number and positive rating seniority among brothers and sisters;
Cha Ping enterprise seniority among brothers and sisters;
Analysis: ten Ge Chaping enterprises;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
Realize: similar favorable comment enterprise ranks realization process;
Comment word warmly
Analysis: good job comments hot word;
Data source: Meituan, public comment, donkey mother, hornet's nest, with journey and way ox amusement, entertain point, comment data of playing
It realizes: the nouns and adjectives etc. that good job comments middle frequency of occurrence more is extracted, calculate frequency of occurrence.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of analysis method of regional tourism industry development, which is characterized in that the described method includes:
Step 1: region to be analyzed is determined;
Step 2: it acquires the relevant data acquisition in region to be analyzed and is pre-processed;
Step 3: building Training Support Vector Machines model;
Step 4: it is based on pretreated regional tourism data to be analyzed and Training Support Vector Machines model, treats analyzed area Tourism industry health degree analyzed;
Step 5: the tourism industry for being analysed to region is analyzed there are many dimension;
Step 6: the analysis based on step 4 and step 5 is as a result, obtain the travel industry Analysis on development result in region to be analyzed.
2. the analysis method of regional tourism industry development according to claim 1, which is characterized in that the step 4 is specific It include: to be segmented and constructed Vector Processing to pretreated regional tourism data to be analyzed;The vector of building is inputted into instruction Practice supporting vector machine model, the Sentiment orientation of every comment is calculated.
3. the analysis method of regional tourism industry development according to claim 1, which is characterized in that the step 2 is specific Are as follows: the tourism industry data that corresponding region to be analyzed is crawled from the website OTA, carrying out pretreatment to data includes: at missing data Reason, dealing of abnormal data, data normalization processing.
4. the analysis method of regional tourism industry development according to claim 1, which is characterized in that building training support to Amount machine model includes: building training set and test set;Based on training set term vector, Training Support Vector Machines model;In training set Increase difference comment, optimizes Training Support Vector Machines model.
5. the analysis method of regional tourism industry development according to claim 4, which is characterized in that building training set and survey Examination collects
(1) mark emotion tends to:
According to 6 food and drink, lodging, traffic, tourism, shopping, amusement classifications, each classification randomly chooses pretreated sample and comments By data, emotion is carried out to every comment content and tends to index, difference, which is commented, is labeled as -1, and favorable comment is labeled as 1, and based on index after Sample comment data constructs training set and test set;
(2) corpus is handled:
Word segmentation processing is carried out to the sample comment data after index using participle tool;
(3) term vector of sample comment data is constructed.
6. the analysis method of regional tourism industry development according to claim 1, which is characterized in that be analysed to region Tourism industry is analyzed there are many dimension, is specifically included:
The epigraph library of a variety of dimensions under building tourism industry is of all categories;Epigraph is arranged;Comment is segmented, is provided point Word result and corresponding part of speech make pauses in reading unpunctuated ancient writings to word segmentation result to comment according to part of speech, form multiple clauses of single comment;According to Various dimensions ladder library calculates and determines that each clause is related to classification and segments the Sentiment orientation of dimension and respective dimensions.
7. the analysis method of regional tourism industry development according to claim 6, which is characterized in that building tourism industry is each The epigraph library of a variety of dimensions under classification, specifically includes: according to 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications, mentioning Take the emotion word that dimension is segmented under each classification, the vocabulary modified by emotion word, the negative word for modifying emotion word, building tourism The epigraph library of industry lower subdivision dimension of all categories;
Epigraph is arranged, is specifically included: will be same one-dimensional in 6 food and drink, lodging, traffic, tourism, shopping and amusement classifications It spends emotion word and polymerize duplicate removal;It polymerize all dimension emotion words, assigns emotional color label for emotion word, agreement is positive: 1, disappear Pole: -1, form emotion dictionary;Food and drink, lodging, traffic, tourism, shopping and entertainment review modificand are polymerize duplicate removal, provided Dimension label, forms document;It proposes the proprietary emotion word of each dimension, marks respective dimensions, form dimension and identify library;By the no of extraction Determine word polymerization duplicate removal, forms negative dictionary.
8. the analysis method of regional tourism industry development according to claim 6, which is characterized in that extracted to each clause Dimension includes: to extract comment evaluative dimension referring to dimension identification library to subordinate sentence, if having vocabulary in subordinate sentence is word in dimension identification library Converge, then can refer to dimension identification library identify clause include dimension, if nothing, then clause dimension not here dimensional analysis concern In dimension;
The Sentiment orientation for calculating comment includes: to be searched clause based on clause and segmented whether vocabulary appears in emotion dictionary, if going out In present emotion dictionary, it may recognize that emotion word is inclined in conjunction with emotion dictionary, if not appearing in emotion dictionary, then it is assumed that clause The dimension emotion word for not having this dimensional analysis to pay close attention in vocabulary;Difference set is asked to obtain result set clause and emotion dictionary, to knot Fruit collection seeks common ground with negative dictionary, by judging element number in intersection, determines negative word frequency of occurrence, is occurred based on negative word Number and emotion word tendency obtain the Sentiment orientation finally commented on.
9. the analysis method of regional tourism industry development according to claim 4, which is characterized in that emotion trend training set Are as follows:
T={ (x1,y1),(x2,y2),...,(xN,yN), yi∈ { -1 ,+1 }, i=1,2 ..., N;Wherein, x indicates sample, y table Show the emotion trend of the sample, 0 represents passive ,+1 representative actively, and N represents the quantity of training data.
10. the analysis method of regional tourism industry development according to claim 4, which is characterized in that training supporting vector Machine model are as follows:
F (x)=sign (w*·x+b*);Wherein, w*And b*Calculating process are as follows: set w*And b*To meet yi(wT·xi+ b) >=1 most Excellent solution and w*·x+b*=0;w*For weight vector, b*For biasing, T is emotion trend training set.
CN201910123321.3A 2019-02-18 2019-02-18 A kind of analysis method of regional tourism industry development Pending CN109858973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910123321.3A CN109858973A (en) 2019-02-18 2019-02-18 A kind of analysis method of regional tourism industry development

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910123321.3A CN109858973A (en) 2019-02-18 2019-02-18 A kind of analysis method of regional tourism industry development

Publications (1)

Publication Number Publication Date
CN109858973A true CN109858973A (en) 2019-06-07

Family

ID=66898226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910123321.3A Pending CN109858973A (en) 2019-02-18 2019-02-18 A kind of analysis method of regional tourism industry development

Country Status (1)

Country Link
CN (1) CN109858973A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598134A (en) * 2019-09-23 2019-12-20 钟栎娜 Big data based intelligent tourist destination data report generation method
CN112418681A (en) * 2020-11-26 2021-02-26 北京上奇数字科技有限公司 Method and apparatus for analyzing industrial development, electronic device, and storage medium
CN112990632A (en) * 2019-12-18 2021-06-18 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN113961699A (en) * 2021-09-26 2022-01-21 北京清华同衡规划设计研究院有限公司 Tourism resource investigation method and system
CN116737922A (en) * 2023-03-10 2023-09-12 云南大学 Tourist online comment fine granularity emotion analysis method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609132A (en) * 2017-09-18 2018-01-19 杭州电子科技大学 One kind is based on Ontology storehouse Chinese text sentiment analysis method
CN108269024A (en) * 2018-01-31 2018-07-10 钟栎娜 A kind of tourist famous-city evaluation method based on big data
CN109034893A (en) * 2018-07-20 2018-12-18 成都中科大旗软件有限公司 A kind of tourist net comment sentiment analysis and QoS evaluating method
CN109213861A (en) * 2018-08-01 2019-01-15 上海电力学院 In conjunction with the tourism evaluation sensibility classification method of At_GRU neural network and sentiment dictionary

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609132A (en) * 2017-09-18 2018-01-19 杭州电子科技大学 One kind is based on Ontology storehouse Chinese text sentiment analysis method
CN108269024A (en) * 2018-01-31 2018-07-10 钟栎娜 A kind of tourist famous-city evaluation method based on big data
CN109034893A (en) * 2018-07-20 2018-12-18 成都中科大旗软件有限公司 A kind of tourist net comment sentiment analysis and QoS evaluating method
CN109213861A (en) * 2018-08-01 2019-01-15 上海电力学院 In conjunction with the tourism evaluation sensibility classification method of At_GRU neural network and sentiment dictionary

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598134A (en) * 2019-09-23 2019-12-20 钟栎娜 Big data based intelligent tourist destination data report generation method
CN112990632A (en) * 2019-12-18 2021-06-18 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN112990632B (en) * 2019-12-18 2024-01-09 北京智识企业管理咨询有限公司 Regional industry competitiveness analysis system and method based on big data
CN112418681A (en) * 2020-11-26 2021-02-26 北京上奇数字科技有限公司 Method and apparatus for analyzing industrial development, electronic device, and storage medium
CN113961699A (en) * 2021-09-26 2022-01-21 北京清华同衡规划设计研究院有限公司 Tourism resource investigation method and system
CN116737922A (en) * 2023-03-10 2023-09-12 云南大学 Tourist online comment fine granularity emotion analysis method and system

Similar Documents

Publication Publication Date Title
CN109858973A (en) A kind of analysis method of regional tourism industry development
Yousaf et al. Halal culinary and tourism marketing strategies on government websites: A preliminary analysis
Juvan et al. Biting off more than they can chew: Food waste at hotel breakfast buffets
Cook et al. The world on a plate: culinary culture, displacement and geographical knowledges
Fleischhacker et al. A systematic review of fast food access studies
Gupta et al. Preferential analysis of street food amongst the foreign tourists: A case of Delhi region
Tarulevicz Eating her curries and kway: A cultural history of food in Singapore
Lu et al. How port aesthetics affect destination image, tourist satisfaction and tourist loyalty?
Gupta et al. Street foods: contemporary preference of tourists and its role as a destination attraction in India
Cankül et al. Travel agencies and gastronomy tourism: case of IATA member a-class travel agencies
Wang et al. Consumer culture in traditional food market: the influence of Chinese consumers to the cultural construction of Chinese barbecue
Deng et al. Exploring the relationships of experiential value, destination image and destination loyalty: A case of Macau Food Festival
Kan et al. Promoting traditional local cuisines for tourists: evidence from Taiwan
Kowalczyk et al. Street food and food trucks: Old and new trends in urban gastronomy
Matejowsky Fast Food Globalization in the Provincial Philippines
Alali et al. Genre-based analysis of travel guides: A study on Malaysia, Thailand and the Philippines
Hashimoto et al. Ekiben, the travelling Japanese lunchbox: Promoting regional development and local identity through food tourism
Cong et al. An indicator measuring the influence of the online public food environment: an analytical framework and case study
Pamantung et al. Revitalization of Minahasan Culture Through Vocabulary of Traditional Food Names in the Context of Developing Culinary Tourism in North Sulawesi Province
Tran et al. Country of origin, Price consciousness, and consumer innovativeness at food service outlets in developing markets: Empirical Evidence from Brands of Imported Beef in Vietnam
Huang et al. Developing Australia’s food and wine tourism towards the Chinese visitor market
Dahiya et al. Exploring the food tourism effectiveness of official websites of BRICS nations
Nakpathom et al. Exploring the expectation of youth purchasing intention for street food as gastronomy tourism in Bangsaen, Thailand
Djonda et al. Linguistic Analysis of Trademarks of Selected Buffet Restaurants in SM Mall of Asia, Manila
Gaman et al. Tourist Image Of Romania Reviewed By International Travel Guides. Comparative Study: English, French And German Editions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607