← Back to Projects
Machine LearningPlatform: Analytics Vidhya

Predict Number of Upvotes

Predict Number of Upvotes

1. Context & Objective

A hackathon challenge to predict the number of upvotes a question will receive on a Q&A platform based on user metrics and question text.

2. Methodology

1. Explored user reputation, views, and answers as numerical features. 2. Applied NLP techniques (TF-IDF) to the problem text. 3. Handled highly skewed right-tailed target distribution with log+1 transformations. 4. Trained an ensemble model led by CatBoost Regressor.
In [1]:
import pandas as pd
from catboost import CatBoostRegressor
from sklearn.metrics import mean_squared_error

df = pd.read_csv('train.csv')
X = df[['Reputation', 'Answers', 'Views']]
y = df['Upvotes']

model = CatBoostRegressor(iterations=500, learning_rate=0.1, depth=6)
model.fit(X, y, verbose=100)

3. Final Learnings

Feature engineering proved more valuable than complex algorithm tuning. The 'Views' to 'Answers' ratio was a massive signal. CatBoost out-of-the-box handled the scaling exceptionally well.

Dataset details

Language

Python

Size

330k rows (Training)

Libraries Used

PandasScikit-LearnCatBoostNLTK