Qubit’s Journey to Petabyte-Scale Machine-Learning and Analytics on Google Cloud Platform … and…

My guest on this week’s Drill to Detail Podcast is Alex Olivier from Qubit, a startup based in London founded by four ex-Googlers that uses big data technology and machine learning to deliver personalized experiences and product recommendations to customers of some of the biggest names in e-commerce, travel and online gaming.

Qubit’s innovation in the space was to move beyond simple A:B testing and cookie-based personalization to create an event-level, petabyte-scale customer activity data lake running in Google Cloud that enables retailers and other organizations deliver personalize offers and site features based on a much more granular understanding of customer behavior and preferences, the same vision in fact that I kept putting forward in my webinars and presentations last year around data reservoirs and customer 360-degree analysis.

And of course, this centralized, event-level store of customer activity and purchase preferences creates a fantastic platform on which to build predictive models, real-time next-best-offer decision engines…

… and enable real-time big data analytics — with Qubit’s product in this area, Live Tap, being what I’ve been working on since last year’s Openworld advising their Product Management team and working under Paul Rodwick, who some of you might know from his time as head of Oracle’s BI Product Development team. I’ll write about Live Tap and the work I’m doing there, and my experiences creating an analytics product on top of BigQuery at a later date, as well as our use of Looker to create a semantic model over Qubit’s event-level data lake.

The podcast episode with Alex Olivier talks about Qubit’s journey from initially using Amazon AWS to land and process data using S3 buckets and MapReduce, then moving it all onsite to a cluster of thousands of HBase region servers storing data ingested and processed using Storm with latency down to four hours, to their current setup using Google BigQuery, Google PubSub and Google Cloud Dataflow processing 100,000 events per second and making it all available to customers with latency around 5 seconds — seriously impressive and a great case study around the use of cloud-hosted, elastically-provisioned big data analytics platforms.

There’s also a video of Alex presenting on Qubit’s architecture at last year’s Google NEXT event in London on Youtube, and this episode along with all the others is also available for download on the iTunes Podcast Directory where you can subscribe for free and automatically download new episodes as they become available Tuesday of every week.