Learning & Building in Data Science Sharing my journey as I pave my own path through the world of data science
Hi there!
You have found one of the best places to level up your data science skills and learn how to make better data-driven decisions.
My name is Joleen, I’m a data scientist and writer and I use this blog to share my passion for data science and statistics.
Blog Every week, I post new articles on data science, statistics and business intelligence. Here are my latest articles.

Classification Ensemble in Python with a Stroke Prediction Dataset
This project is based on Season 3, Episode 2 of the Kaggle Playground Series. The title of this episode is: “Tabular Classification with a Stroke Prediction Dataset”. Our task is to predict the probability that a patient will have a stroke. The target, stroke, is a binary variable and so classification methods are needed to predict the […]

Regression in Julia with the California Housing Dataset
This project is based on Season 3, Episode 1 of the Kaggle Playground Series. The title of this episode is: “Tabular Regression with the California Housing Dataset”. Our task is to predict the median housing value of a block group of housing. In this project, my goal is to use the Julia programming language for […]

Overview of Machine Learning Ensemble Methods
Voting Ensembles Voting ensembles combine diverse machine learning models using techniques like majority voting or average predictions. The individual models used in the ensemble could be regression or classification-based algorithms. Once the individual models have been trained, the ensemble can be constructed in a couple of different ways. In regression, ensembles are created by averaging […]

Decision Trees in Python: Predicting Diabetes
In this post, we’ll be learning about decision trees, how they work and what the benefits are for using them. We’ll also use this algorithm in a real-world data to predict diabetes. So, what are decision trees? Decision trees are a machine learning method for classification or regression. It works by segmenting the dataset through […]