應用文字探勘技術進行博物館遊客情感分析

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

在當今時代,信息獲取的途徑已經變得多樣化,遊客除了在傳統的旅遊景點官方網站上收集信息,還越來越多地從旅遊評價網站(如TripAdvisor和Google Maps)上收集相關旅行經驗分享。這些信息對於景點的運營和管理至關重要。文字探勘技術在分析非結構化數據方面非常有效,並且可以為分析這類評論數據提供一種可行的研究方法。因此,本研究利用文字探勘技術對博物館評論進行分析,包括斷詞、TF-IDF向量化、特徵詞選取、關鍵詞共現、主題模型建立和情緒分析。通過關鍵詞共現和Leiden社群檢測,得到5個常見的討論話題,包含遊覽時長、著名藏品、入場情況、整體評價、及知名藝術家,以及某些僅屬於特定博物館的特定話題。並且,利用主題模型建立來識別每個主題的內容和重要性,其中部分主題與關鍵詞共現分析結果一致,進一步驗證了這些主題的重要性。此外,不同語言評論者對於博物館的關注重點也被發掘。通過計算遊客對博物館評論的情緒分析準確度,並對羅吉斯迴歸(LR)、隨機森林(RF)、支持向量機(SVM)及BERT模型進行深度比較。整體而言,綜合不同語言類別的評論進行情緒分析的結果來看,LR模型的預測效能最佳。此外,根據LR模型中單詞的係數,進行篩選後,形成了相對應的不同語言下的,關於博物館評論的正面和負面情緒詞典。這些結果呈現了主題分佈,並檢驗了特徵詞與情緒分析結果之間的關係。在本研究中,共從TripAdvisor上收集了八家世界知名博物館的英語、簡體中文和繁體中文的遊客評論,評論數據總數量約415,000條。這些研究結果能為提高博物館的管理及運營策略提供寶貴的見解。
In an era where information is obtained through multiple channels, tourists increasingly gather experiential travel information from travel evaluation websites (such as TripAdvisor and Google Maps) in addition to official websites of traditional tourist attractions. This information is crucial for the operation and management of scenic spots. Text mining techniques are effective in analyzing unstructured data and thus provide a feasible research method for analyzing such review data. Therefore, this study utilized text mining to analyze museum review data, encompassing word segmentation, TF-IDF vectors, feature word selection, keyword co-occurrence, topic modeling, and sentiment analysis. Using keyword co-occurrence analysis and the Leiden community detection algorithm, five common discussion topics were identified: Visit Duration, Famous Collections, Entry Situation, Overall Rating, and Famous Artist, along with certain museum-specific topics. Additionally, topic modeling analysis was employed to identify the content and significance of each theme, with some topics aligning with the keyword co-occurrence analysis results, further validating their importance. Moreover, the focus areas of reviewers from different languages were uncovered. By calculating the accuracy of sentiment analysis for museum reviews, a comprehensive comparison of Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Bidirectional Encoder Representations from Transformers (BERT) models was conducted. Overall, sentiment analysis results across different language categories indicated that the LR model performed the best. Furthermore, by examining the coefficients of words in the LR model, a curated dictionary of positive and negative sentiment words for museum reviews in different languages was established. The results generated review topic distributions and examined the relationships between feature words and the outcomes of sentiment analysis. In this study, tourist reviews of eight world-renowned museums in English, Simplified Chinese, and Traditional Chinese were collected from TripAdvisor. The dataset was composed of approximately 415,000 reviews. These findings can provide valuable insights for enhancing the management and operation strategies of museums.

Description

Keywords

文字探勘, 博物館, 情緒分析, TF-IDF, 主題模型, 關鍵詞共現, Text Mining, Museum, Sentiment Analysis, TF-IDF, Topic Modeling, Keyword Co-occurrence

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By