Last week I was sick so this post got delayed unfortunately but here it is. I’ve used a dataset from my city hall data set about traffic accidents on 2015 between June and December which have a date attached to it, here is a glance of how a data row looks like:
P.S.: The values are in Portuguese
My idea I was to use what I’ve learned last week on a real data set and see what valuable thing I could do. So, to make it simple, I just made a Series based on the “date” column, which has a date associated with a count of the many rows on that date.
Then, I visualized the data to check if I could find any trend, and let’s see some facts about it:
As you can see, there is a time (when there isn’t any big festival or something like that in the city) the number of accidents stay low compared to other months but what I found interesting was how since October the numbers simply grow, I think I’ll dig it up by weeks to see if the patterns present themselves better in some way.
As you can see, simply aggregating data and visualizing it make you able to find a lot of interesting caveats on your time series and help you build your model (or know whether it can be useful to build one). And I still have to clean some data, there were some mistyping by the people who enter the data so there things like “2051” on the year and some stuff like that.
If you reached here, I can just say thank you and hope you continue a reader and also do your own projects to grow.