Climate-Change-Sentiment-Classifier

Climate Change Sentiment Classifier

Portfolio LinkedIn

Banner

The Problem

For a company selling climate-focused products, knowing what people think about climate change isn’t just interesting — it directly shapes how they should market, where they should invest, and which audiences are worth pursuing.

The challenge is scale. Public sentiment on climate change plays out across millions of social media posts every day. Reading it manually isn’t an option. And a simple positive/negative sentiment score misses the nuance that actually matters here: whether someone believes climate change is man-made, actively rejects it, is indifferent, or is simply sharing a news story.

These four attitudes require completely different messaging strategies. Treating them the same means wasted spend and missed connections.

What I Built

A multi-class text classification model that reads a tweet and predicts which of four sentiment categories it belongs to:

Class Meaning
Pro The author believes climate change is real and man-made
Anti The author rejects the idea of man-made climate change
Neutral No clear stance — neither supporting nor opposing
News Factual reporting or information sharing without personal opinion

The model was trained on real tweet data, deployed as a live web application, and built to run at scale — so marketing and research teams can feed in large volumes of social media content and get structured sentiment breakdowns without writing a single line of code.

My Contribution

Text Preprocessing Pipeline

Raw tweet data is messy — URLs, hashtags, numbers, inconsistent casing, slang. Before any modelling could happen, the text needed to be cleaned and normalised. I built a preprocessing pipeline that handled:

These steps weren’t just housekeeping. Clean, well-normalised text directly improves model performance, especially for a task where a single word like “hoax” or “crisis” carries strong signal.

Exploratory Analysis & Feature Engineering

Before training, I used data visualisation to understand the class distribution, identify common terms per class, and spot potential imbalances that could bias the model. Text was then converted to numerical features using vectorisation techniques (TF-IDF), with deliberate feature selection to keep the input space meaningful rather than bloated.

Model Training & Evaluation

I trained and compared three classification algorithms:

Models were evaluated using Mean F1-Score — the right choice here because the classes are not evenly distributed, and accuracy alone would have been misleading.

Deployment

The winning model was packaged into a Streamlit application and deployed on an AWS EC2 instance, making it accessible to non-technical users. Anyone on the marketing or research team can paste in a tweet (or a batch of tweets) and get an instant classification.

Tech Stack

Area Tools
Language Python
Data Processing Pandas, NumPy
NLP NLTK, Scikit-learn (TF-IDF)
Modelling Logistic Regression, Linear SVC, Multinomial Naive Bayes
Evaluation Mean F1-Score
Visualisation Matplotlib, Seaborn
Deployment Streamlit, AWS EC2
Version Control GitHub

Results

The deployed classifier gives the client a scalable, repeatable way to monitor public sentiment on climate change — broken down by the four attitudes that actually inform marketing decisions. Instead of guessing which audience segment to target or how to frame a campaign, teams now have data to work from.

The four-class distinction matters in practice: a brand talking to “Anti” users the same way it talks to “Pro” users isn’t just ineffective — it can be actively damaging to brand perception.

What I’d Do Differently

Let’s Talk

If you’re working on an NLP, classification, or data science problem and want someone who can handle the full pipeline — from messy raw data to a deployed application — get in touch.