Introduction

Cars have always been a big thing in my life. I was growing deeply immersed in the car culture, be it participating in local car clubs or hanging out in a garage till the dawn with my friends fixing their cars. Given this, it is no surprise that a lot of my friends, when it came to what they wanted to be when they grow up, decided to pursue a career in the car industry. Some of them now work in a trade-in companies, dealing with second-hand vehicles, others considered a car an investment and would buy one to sell it later, adding some money and buying a fresher, better and larger automobile in order to do the same a few years later. When I started this project I had these people in mind and how I can help them do their jobs and make their lives easier by utilizing the skills I have learned and been trying to apply lately, in real life scenarios.

Dataset

When I decided on the main idea of the project, I did not think through the details, saying “I will figure it out when I am at it”, which was a big mistake and caused me a lot of time lost. However, there was reasoning behind it, best imbodied in the expression “There is no such thing as a perfect dataset”, meaning thinking about all the questions and then trying to find the dataset that will help answering them all is a pain in the neck. As my experience showed, it is almost impossible to find such dataset, nevertheless it is much more painful to spend three days worth of time cleaning the data only to find out it is completely useless and there is no insights that can be pulled out from it. I came to a conclusion, that it is much more productive to spend time planning the project and considering all the possible uses of it, and working with dataset that answers the most important questions being ready to sacrifice some, than trying to find insights in the data “on the fly”.

My main idea was to reveal insights and patters that can help:

a) trade-in companies focus on the makes and models that are in a high demand;

b) buyers of the vehicles have more information about the change of value of the vehicle with its age to make an informed decision on which car to buy;

c) sellers of the automobiles to be aware of the their car’s value on the market.

The dataset I ended up using was generated scraping ebay in Germany. It was found on kaggle.com and is called Used Cars. Originally dataset was posted on the data.world.

The dataset includes following columns :