How I Built a Review Analysis Pipeline for JD Wetherspoon Pubs using Python and NLP

Overview of the Project

In the digital age, customer reviews are gold mines of information, offering unfiltered insights into consumer experiences and perceptions. For businesses like JD Wetherspoon, a popular pub chain, these reviews are not just feedback but also a valuable resource for understanding customer sentiment and the topics that matter most to them. This technical blog post delves into an analytical journey, exploring sentiment analysis and topic modeling of customer reviews for all 25 JD Wetherspoon pubs in Greater Manchester.

Importance and Application of Sentiment Analysis and Topic Modeling

Sentiment analysis and topic modeling are two pivotal tools in the realm of Natural Language Processing (NLP) that help in extracting meaningful information from vast textual data. While sentiment analysis decodes the emotions and opinions expressed in the text, indicating whether the overall sentiment is positive, negative, or neutral, topic modeling uncovers the underlying themes or topics, providing insights into what subjects are being discussed.

For a business, applying these techniques to customer reviews can reveal patterns and trends in customer satisfaction and preferences. It helps in identifying areas of strength and aspects needing improvement, thereby guiding strategic decisions and enhancing customer experience.

Description of the Dataset

The dataset in focus comprises customer reviews for 25 JD Wetherspoon pubs located in Greater Manchester. These reviews were meticulously scraped and provided in a zipped format, encompassing a diverse range of customer opinions and ratings. Each review includes the reviewer’s name, review date, star rating, the text of the review, and additional information about the reviewer’s profile and the specific pub.

Steps in Data Collection

The data collection process involved scraping reviews from a public platform where customers shared their experiences. This process captured various aspects of customer feedback, from detailed textual reviews to numerical ratings. The collected data was then compiled into separate files for each pub, resulting in a comprehensive dataset that offers a panoramic view of customer sentiments across different locations.

Data Preparation and Cleaning Process

Data preparation is a crucial step in ensuring the accuracy and quality of the analysis. For this project, the following data preparation steps were undertaken:

Combining Datasets: The individual datasets for each pub were merged into a single dataset, enabling a collective analysis of reviews across all pubs.
Handling Missing Data: The focus was on the textual reviews; hence, only non-null review texts were considered. Missing values in other columns were not specifically addressed, as they were not critical to the sentiment analysis or topic modeling.
Text Preprocessing:
- For sentiment analysis, the reviews were used as-is without additional text cleaning, relying on the capabilities of the TextBlob library to interpret the nuances in the text.
- For topic modeling, CountVectorizer was employed to convert the text data into a numerical format suitable for analysis. This step included removing common English stopwords and setting thresholds for word frequency to ensure relevancy.


# Function to calculate sentiment polarity
def calculate_sentiment(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return None

# Applying sentiment analysis on the 'Review Text' column
reviews_df['Sentiment'] = reviews_df['Review Text'].apply(calculate_sentiment)

# Displaying the first few rows with sentiment scores
reviews_df[['Review Text', 'Sentiment']].head()

The prepared dataset served as a robust foundation for the subsequent analyses, ensuring that the insights derived were both reliable and meaningful.

Introduction to Topic Modeling and Its Significance

Topic modeling is an unsupervised machine learning technique used to uncover hidden thematic structures within a large corpus of text. It identifies patterns of word clusters and topics within documents, providing insights into the underlying themes. In the context of customer reviews, topic modeling can reveal common subjects or issues that customers frequently discuss, offering businesses a deeper understanding of their clientele’s interests and concerns.

This technique is particularly valuable for organizations looking to parse through extensive textual data, helping them categorize content into thematic groups and identify prevailing sentiments or opinions about specific topics.

Methodology for Topic Modeling

The topic modeling for this project was conducted using the Latent Dirichlet Allocation (LDA) algorithm, a popular method in NLP for topic discovery. The LDA model assumes that each document (in this case, a review) is a mixture of topics, and each topic is a mixture of words. The following steps outline the methodology:

Document-Term Matrix Creation: Utilizing CountVectorizer from the scikit-learn library, the review texts were converted into a document-term matrix (DTM). This matrix represents the frequency of words in each document, excluding common stopwords.
LDA Model Application: An LDA model was then applied to the DTM. The model was set to identify a fixed number of topics (in this case, 5), each represented by a cluster of words with significant co-occurrence in the dataset.
Topic Interpretation: Each topic was interpreted based on its most frequent and characteristic words. This step involved qualitative analysis to understand the themes represented by each topic.

Discussion of Identified Topics

The LDA model successfully identified five distinct topics within the reviews. Each topic was characterized by a set of words that frequently appeared together in the reviews, hinting at common themes discussed by customers. For example:

Topic 1 might represent positive aspects of the pubs, focusing on the quality of food and service.
Topic 2 could relate to specific dishes or menu items, reflecting customers’ culinary preferences or disappointments.
Topic 3 might center around the overall customer experience, including the ambiance and social aspects of the pubs.
Topic 4 could discuss logistical aspects, such as location and accessibility.
Topic 5 might delve into customer service experiences, highlighting staff interactions and service efficiency.

These topics provide a multifaceted view of customer opinions, revealing not just what customers are saying, but what subjects are most important to them.



# Creating a document-term matrix
vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
dtm = vectorizer.fit_transform(text_data)

# LDA Model
lda = LatentDirichletAllocation(n_components=5, random_state=0)  # 5 topics
lda.fit(dtm)

# Function to display topics
def display_topics(model, feature_names, no_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic %d:" % (topic_idx))
        print(" ".join([feature_names[i] for i in topic.argsort()[:-no_top_words - 1:-1]]))

# Displaying the topics
display_topics(lda, vectorizer.get_feature_names_out(), 10)  # Top 10 words per topic

Insights and Conclusion

Summary of Key Findings

The analysis of the JD Wetherspoon pubs reviews through sentiment analysis and topic modeling has yielded several valuable insights:

Overall Positive Sentiment: The sentiment analysis revealed a generally positive sentiment across the reviews. The majority of the reviews leaned towards a positive tone, indicating overall satisfaction among the customers.
Diverse Range of Opinions and Experiences: The wide range of sentiment scores from negative to positive highlighted the diversity in customer experiences and opinions.
Identified Topics Reflect Customer Priorities: The topics uncovered through LDA topic modeling provided insight into the aspects of the pubs that customers talk about most. These ranged from food and service quality to the ambiance and operational aspects like location and staff interaction.

Implications for JD Wetherspoon

The findings from this analysis offer several actionable insights for JD Wetherspoon:

Strengthening Strong Points: The positive sentiments and the topics related to good food and service suggest areas where the pubs are performing well, which can be further strengthened.
Addressing Areas of Concern: The diversity in sentiment scores and the specific issues raised in certain topics highlight areas that may require attention and improvement, such as customer service in some locations or aspects of the physical environment.
Tailoring Customer Experience: Understanding the most discussed topics can guide JD Wetherspoon in tailoring their offerings and services to match customer preferences and expectations.

Concluding Thoughts

This analysis demonstrates the power of sentiment analysis and topic modeling in extracting meaningful insights from customer reviews. For businesses like JD Wetherspoon, such techniques provide a window into the customers’ minds, helping them to understand and enhance the customer experience. As we move further into the era of data-driven decision-making, the ability to intelligently analyze customer feedback will continue to be an invaluable asset.

How I Built a Review Analysis Pipeline for JD Wetherspoon Pubs using Python and NLP

ByTimothy Adegbola

Overview of the Project

Importance and Application of Sentiment Analysis and Topic Modeling

Description of the Dataset

Steps in Data Collection

Data Preparation and Cleaning Process

Introduction to Topic Modeling and Its Significance

Methodology for Topic Modeling

Discussion of Identified Topics

Insights and Conclusion

Summary of Key Findings

Implications for JD Wetherspoon

Concluding Thoughts

By Timothy Adegbola

Related Post

Building an Automated Text Classifier for Nairaland.com: A Step-by-Step Guide

Learning to Think Like a Machine: A Kid-Friendly Guide to Machine Learning Algorithms

Beyond the Numbers: How Machine Learning Differs from Theoretical Statistics

Leave a Reply Cancel reply