Predicting House Prices
📌 Type
Kaggle Competition
Regression
⚜️ Domain
Real Estate
House Prices
💻 Technologies
Python (Kaggle Notebook)
pandas
numpy
sklearn
matplotlib
seaborn
🕹️ Skills
Machine Learning
Data Preprocessing
Feature Engineering
Data Visualization
Data Analysis
🏘️ Worked on the Kaggle competition "House Prices - Advanced Regression Techniques" where I successfully predicted the sale price of 1459 houses from a dataset of 1460 records of 79 features using Python 🐍.
🔎 Performed Exploratory Data Analysis (EDA), looking deep for missing values, distributions, counts, correlations and more with a lot of use of pandas, matplotlib and seaborn.
📊 Created a "Feature Analyzer", really helpful for EDA, which gives relevant information and plots to quickly get useful insights about a certain feature, categorical or numerical, taking advantage of matplotlib and seaborn.
🧹 Used pandas, numpy and sklearn for cleaning and preprocessing, changing data types, ordinal encoding, dummies, lots of feature engineering 🛠️ and more.
🤖 Tested different models, including several from sklearn, like RandomForestRegressor and GradientBoostingRegressor optimizing with GridSearchCV, concluded with CatBoostRegressor as the best model.
🧾 Evaluated performance with a custom scorer, RMSLE (root-mean-squared-log-error), and got 0.12236, which is as high as top 10% of competitors 🏆.
Screenshots
Story of the Project
As one of the first projects I was going to work on, I wanted to take advantage of everything I learned about Python, machine learning, visualization and analysis in a single project, so I found this interesting dataset in Kaggle and got to work. 🔨
I like keeping things simple but really effective, and that is just what I did, besides completing the task I also wanted to complement it with something unique and useful, so I created my own function, a "Feature Analyzer" 📊, as described above, with a lot of visualization tools, hoping that it could even help other for their own analysis.
I really went beyond of what I originally learned, but that means I have even more knowledge now. 🧠 I had a blast working on this project, testing different models and using my creativity for feature engineering and solving any errors that pop up. ⭐