Big Data is not a buzzword anymore. Nowadays, companies are looking to take advantage of large datasets available in their disposal. Domains that enter this practice range from telcos to food and retail to aggregate data and seemingly make sense of data using powerful machine learning tools and algorithms. It’s now safer to say that Big Data and Data Science jobs are quickly rising in popularity and demand.
One thing that would stand out as a Big Data professional in their resume, is experience with Hadoop, a distributed data computing tool run in load balancing cluster of servers that work harmoniously together. A common scenario to demonstrate its capability is to compare it with the traditional relational database which has limited space and computing power. Let’s say you have 1GB of data to store in MySQL installed in a desktop PC for a month in production. As the business grows, there is a need to upgrade the hard disk to 10GB, 100GB, 1TB and so on. This would require buying additional storage and migration of data as needed as time goes. Unlike this approach, Hadoop environment allows the use of commodity hardware/servers to be readily connected in parallel to do all the computation of data, memory usage and disk space, instead of using just a single machine. This allows better performance and purchase of cheaper commodity hardware for scaling the business.
In line with this, I will be sharing all my notes from my research and study of Big Data, which started in 2016. Will keep you posted. Hopefully, the posts will be useful and interesting to those who need a quick overview of the value this technology promises in the real-world.