A beginner’s approach to Titanic dataset…

August 21, 2020 Himanshu Singh

Titanic as we all know is one of a disaster of it’s kind which happened in 1912. Inspired from the true events Titanic is not only an oscar winning movie but also it’s dataset of passenger details enjoys warmth of data science enthusiasts.

For data science enthusiasts i have prepared a jupyter notebook which shows a step by step easy solution of how to approach the problem.

The notebook doesn’t use any complex code and provides a simple approach towards a working model which provides an almost 80% accuracy.

The only concept that a beginner might struggle with is one hot encoding of categorical columns like sex or pclass.

It is important to understand here that a model(algorithm) requires the input values to be converted into an array of numbers , hence the algorithm wouldn’t understand if it is passed a string value like male/female . Converting male to 1 or female to 0 would also do the job here . The approach we used is to drop the sex column and add two columns male and female and assign 1 as True and 0 as False for each passenger . Since a passenger is either identified as male or female in the dataset so a comobination of these 2 columns would be either 1-0 or 0-1 . To understand more on one hot encoding read here.

Please feel free to reach out to me if you have any further questions.

Post Views: 842

Share this:

You May Also Like

Handling Digit Recognizer dataset without deep learning

Read / Load images into a numpy array