For a company selling climate-focused products, knowing what people think about climate change isn’t just interesting — it directly shapes how they should market, where they should invest, and which audiences are worth pursuing.
The challenge is scale. Public sentiment on climate change plays out across millions of social media posts every day. Reading it manually isn’t an option. And a simple positive/negative sentiment score misses the nuance that actually matters here: whether someone believes climate change is man-made, actively rejects it, is indifferent, or is simply sharing a news story.
These four attitudes require completely different messaging strategies. Treating them the same means wasted spend and missed connections.
A multi-class text classification model that reads a tweet and predicts which of four sentiment categories it belongs to:
| Class | Meaning |
|---|---|
| Pro | The author believes climate change is real and man-made |
| Anti | The author rejects the idea of man-made climate change |
| Neutral | No clear stance — neither supporting nor opposing |
| News | Factual reporting or information sharing without personal opinion |
The model was trained on real tweet data, deployed as a live web application, and built to run at scale — so marketing and research teams can feed in large volumes of social media content and get structured sentiment breakdowns without writing a single line of code.
Text Preprocessing Pipeline
Raw tweet data is messy — URLs, hashtags, numbers, inconsistent casing, slang. Before any modelling could happen, the text needed to be cleaned and normalised. I built a preprocessing pipeline that handled:
These steps weren’t just housekeeping. Clean, well-normalised text directly improves model performance, especially for a task where a single word like “hoax” or “crisis” carries strong signal.
Exploratory Analysis & Feature Engineering
Before training, I used data visualisation to understand the class distribution, identify common terms per class, and spot potential imbalances that could bias the model. Text was then converted to numerical features using vectorisation techniques (TF-IDF), with deliberate feature selection to keep the input space meaningful rather than bloated.
Model Training & Evaluation
I trained and compared three classification algorithms:
Models were evaluated using Mean F1-Score — the right choice here because the classes are not evenly distributed, and accuracy alone would have been misleading.
Deployment
The winning model was packaged into a Streamlit application and deployed on an AWS EC2 instance, making it accessible to non-technical users. Anyone on the marketing or research team can paste in a tweet (or a batch of tweets) and get an instant classification.
| Area | Tools |
|---|---|
| Language | Python |
| Data Processing | Pandas, NumPy |
| NLP | NLTK, Scikit-learn (TF-IDF) |
| Modelling | Logistic Regression, Linear SVC, Multinomial Naive Bayes |
| Evaluation | Mean F1-Score |
| Visualisation | Matplotlib, Seaborn |
| Deployment | Streamlit, AWS EC2 |
| Version Control | GitHub |
The deployed classifier gives the client a scalable, repeatable way to monitor public sentiment on climate change — broken down by the four attitudes that actually inform marketing decisions. Instead of guessing which audience segment to target or how to frame a campaign, teams now have data to work from.
The four-class distinction matters in practice: a brand talking to “Anti” users the same way it talks to “Pro” users isn’t just ineffective — it can be actively damaging to brand perception.
If you’re working on an NLP, classification, or data science problem and want someone who can handle the full pipeline — from messy raw data to a deployed application — get in touch.