Pre-GSoC Experience @ mlpack

David Port Louis
3 min readJun 26, 2021

--

Learn more about my project from my previous blog

Prologue

On the very first day I started to draft my proposal for GSoC, I started to experiment with my ideas for Pre-GSoC contribution, because I wanted to get myself familiar with the code base, documentation, API’s and best practices for contributing to mlpack.

So I started with a small task of predicting salary from experience using LinearRegression API provided by mlpack.

I started this task in early April. During this period I had a tough time setting up my local environment for interactive C++ using xeus-cling. I’ll write a blog on how to setup a local environment for experimenting with mlpack soon.

Objective

Predict the salary of an employee given how many years of experience they have.We will train a Linear Regression model to learn the correlation between the number of years of experience of each employee and their respective salary.

I started to work on python notebook and made a PR as soon as I completed it. I received various helpful suggestions from my mentors and reviewers, most of the time I messed up the style guideline which I got a hold of during this period

Later I started to work on the C++ end, initially it started out as an standalone C++ program and later grew into an interactive C++ notebook, one capable of explaining the readers a story.

I started with Exploratory Data Analysis on the data, here is a scatter plot from the C++ notebook using matplotlibcpp, a header only C++ wrapper for Python’s matplotlib plotting library.

Linear Regression

Regression analysis is the most widely used method of prediction. Linear regression is used when the datasets has a linear correlation and as the name suggests, simple linear regression has one independent variable (predictor) and one dependent variable(response).

The simple linear regression equation is represented as y = a+bx where x is the explanatory variable, y is the dependent variable, b is coefficient and a is the intercept

To perform linear regression I used LinearRegression() API from mlpack.

Here’s the plot of best fit line predicted by the trained model.

Finally I used various Evaluation metrics such as MAE, RMSE & MSE to quantify how well the trained model was able to perform on unseen data.

Epilogue

I thought for writing an verbose explanation about the approach I followed in both the notebooks in this blog. Later I realized on not let the notebook narrate the story and approach by themselves instead of me sprinkling some code here and there. Make sure to take a look at the salary prediction notebooks using mlpack by visiting our repo or at binder.

That’s all for today. I will write another one this weekend for this week’s progress. Stay tuned!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

David Port Louis
David Port Louis

Written by David Port Louis

Junior Majoring in CS | Deep Learning and Machine Learning Enthusiast | Loves to explore new technologies

Responses (1)

Write a response