Learning & Building in Data Science Sharing my journey as I pave my own path through the world of data science
Hi there!
You have found one of the best places to level up your data science skills and learn how to make better data-driven decisions.
My name is Joleen. I’m a data scientist and writer, and I use this blog to share my passion for data science and statistics.
Some of the companies I have worked with:
Blog Every week, I post new articles on data science, statistics and business intelligence. Here are my latest articles.
The Impact of Data Analytics on Business Decision-Making
The sheer volume of data available is staggering, presenting both opportunities and challenges. Our ability to harness and interpret this data can have a direct impact on the direction and success of businesses. There is increasing pressure on businesses to not only adapt to this new world of AI and data but to thrive. Right […]
Evaluating Classification Models: Understanding the Confusion Matrix and ROC Curves
One of the most important aspects of machine learning classification models is evaluating how well they predict the target. For this, it’s essential to have a solid understanding of the confusion matrix and ROC curves. The confusion matrix breaks down a model’s predictions by showing true positives, true negatives, false positives, and false negatives. On […]
Data Preprocessing Techniques Every Data Scientist Should Know
In data science, there’s an often-underestimated hero working behind the scenes: data preprocessing. Imagine analyzing a dataset filled with gaps, inconsistencies, and outliers. The results would be like deciphering a blurred photograph—it’s frustrating and usually just a total waste of time. That’s where data preprocessing comes in. It’s the process of cleaning, transforming, and structuring […]
Classification Ensemble in Python with a Stroke Prediction Dataset
This project is based on Season 3, Episode 2 of the Kaggle Playground Series. The title of this episode is: “Tabular Classification with a Stroke Prediction Dataset”. Our task is to predict the probability that a patient will have a stroke. The target, stroke, is a binary variable and so classification methods are needed to predict the […]