LinkedIn announced the development of the Pinot analytics software in September 2014, and the news generated a lot of excitement in the world of big data, because the professional networking website has always had a good reputation for its skill in handling huge amounts of information and using it for the enrichment of its own applications.
Earlier this week, the company surprised those who were following the development of the software by publishing the source code on GitHub, making it free for anyone interested in it to download and examine.
Scalable Analytics for All
The Pinot software is highly scalable and flexible, and it was designed to feed results into LinkedIn’s web app as quickly as possible. It uses an SQL-like interface that is designed to provide high throughput and low latency so that it can cope with the massive demand that the traffic levels of LinkedIn will put on it.
A more basic version of the software has been behind the “Who Viewed Your Profile” function of LinkedIn for more than a year. The developers had been using MySQL and Oracle for their site and relied on a batch job that would process the information from the Hadoop file system, which is open-source, and feed it into their database. This old system was too slow to be useful, with profile views sometimes taking several days to be parsed and shown to the end-user.
The new system is capable of handling up to a billion events a day. It is similar in design and functionality to another open-source tool called Druid. The team at LinkedIn were concerned that Druid would not be scalable enough (even though some production-environment Druid analytics installations are handling more than a trillion events per month), and this was a part of the reason that they decided to develop their own software.
LinkedIn uses analytics-based workloads for a huge number of tasks, including profile views, ads and jobs postings, and they will be moving more and more of their features over to Pinot over the coming months. Their goal is to make all of the site’s features “intelligent” and improve them based on the massive amounts of data that the site collects as users communicate with each other and network on the website. Pinot is just one of the data tools that they have released as open source, and it is well worth a look for those interested in big data.