top of page

Predicting House Prices

 
Python logo
Kaggle logo
Pandas logo
NumPy logo
sklearn logo
Matplotlib logo
Seaborn logo

📌 Type

Kaggle Competition

Regression

⚜️ Domain

Real Estate

House Prices

💻 Technologies

Python (Kaggle Notebook)

pandas

numpy

sklearn

matplotlib

seaborn

🕹️ Skills

Machine Learning

Data Preprocessing

Feature Engineering

Data Visualization

Data Analysis

🏘️ Worked on the Kaggle competition "House Prices - Advanced Regression Techniques" where I successfully predicted the sale price of 1459 houses from a dataset of 1460 records of 79 features using Python 🐍.

🔎 Performed Exploratory Data Analysis (EDA), looking deep for missing values, distributions, counts, correlations and more with a lot of use of pandas, matplotlib and seaborn.

📊 Created a "Feature Analyzer", really helpful for EDA, which gives relevant information and plots to quickly get useful insights about a certain feature, categorical or numerical, taking advantage of matplotlib and seaborn.

 

🧹 Used pandas, numpy and sklearn for cleaning and preprocessing, changing data types, ordinal encoding, dummies, lots of feature engineering 🛠️ and more.

🤖 Tested different models, including several from sklearn, like RandomForestRegressor and GradientBoostingRegressor optimizing with GridSearchCV, concluded with CatBoostRegressor as the best model.

🧾 Evaluated performance with a custom scorer, RMSLE (root-mean-squared-log-error), and got 0.12236, which is as high as top 10% of competitors 🏆.

Screenshots

 
Feature Analyzer Numerical Output
Feature Analyzer Categorical Output
Part of Code of Feature Analyzer
Part of Code of Preprocessing

Story of the Project

 

As one of the first projects I was going to work on, I wanted to take advantage of everything I learned about Python, machine learning, visualization and analysis in a single project, so I found this interesting dataset in Kaggle and got to work. 🔨

I like keeping things simple but really effective, and that is just what I did, besides completing the task I also wanted to complement it with something unique and useful, so I created my own function, a "Feature Analyzer" 📊, as described above, with a lot of visualization tools, hoping that it could even help other for their own analysis.

I really went beyond of what I originally learned, but that means I have even more knowledge now. 🧠 I had a blast working on this project, testing different models and using my creativity for feature engineering and solving any errors that pop up. ⭐

bottom of page