Google Cloud Cortex Framework brings Packaged Analytics to the Modern Data Stack

Google Cloud Cortex Framework is a new packaged analytics initiative from Google Cloud Platform that provides pre-build data extractors, data transformations and interactive dashboards for SAP, Salesforce and a number of marketing and advertising data sources.

Built around Google Cloud Platform technologies such as Google BigQuery, Looker, Google Cloud Composer and Vertex AI, the Google Cloud Cortex Data Foundation element of the framework is a set of reference architectures, building blocks and templates for organizations looking to modernize with Google Cloud Platform’s Data Cloud.

 
 

Even more interestingly, Google Cloud Cortex Framework is free-to-use, extend and fork and has been made available as an open-source Cortex Data Foundation repository hosted on Github that contains data transformation logic, data models and templates designed to be deployed as-is or extended by partners and the wider community.

The wider set of Cortex Foundation assets include

  • predefined BigQuery models for private (for example, SAP and Salesforce), public (for example, trends) and commercial (for example, Analytics Hub) data

  • data processing templates to transform, enrich and to a limited-extent combine different data sources into cross-data source datasets

  • example ML code with Vertex AI

  • sample dashboards and explores for use with Looker, such as the one below from the SAP dashboard pack

 
 

Benefits to customers making use of these packaged analytics solutions include

  • decreased time-to-value compared to building from scratch

  • immediate access to best-practice industry KPIs and dashboards

  • an end-to-end solution that’s pre-integrated and runs at-scale

So What’s in this for Google? While Google Cloud Cortex Framework is free to use and adapt for your own use, the payoff from Google’s perspective is the SAP, Salesforce and other reporting and hosting workloads it will hopefully make easier to bring over to Google Cloud Platform.

Google Cloud, through the Looker Marketplace and Looker blocks, has provided a limited form of packaged analytics solutions for Looker customers such as those in the screenshots below for Jira, NetSuite and Hubspot.

 
 

We’ve also published Looker blocks in the past for uses-cases such as Multi-touch, Multi-Cycle Marketing Attribution for Google Analytics 4 but they aren’t easily customisable and still leave you with silos of reporting rather than an integrated, single view of all your operations.

But ever since the news broke back in 2018 that Thomas Kurian was leaving Oracle to join Google and head-up Google Cloud, something like this (and the Looker acquisition that preceded it) was always on the cards. From my blog written at the time:

“5. Expect to see GCP moving increasingly into packaged SaaS application and analytics solutions for ERP, CRM and Financials to complement their commoditised IaaS and PaaS cloud business and leveraging their massive GSuite and increasingly ChromeOS install base … and a business model that could provide these applications and packaged analytics for free… That’s the real existential threat to Oracle; spending all their time trying to win an un-winnable cloud infrastructure war and then GCP coming along and making ERP, CRM and business applications and their analytics essentially free.”

and on Twitter:

Pre-built analytics solutions for Oracle and other vendors’ ERP, CRM and Financials applications, running on-top of Oracle’s suite of database and analytics tools, were always super-popular with Oracle’s customers as — in theory at least — the hard work required to build an integrated data warehouse and set of dashboards had already been done for you, bringing down the time to deploy an enterprise reporting solution from months to weeks.

 
 

The value delivered to customers using this buy vs. build approach to enterprise data warehousing was such that the license cost of each individual application — Sales Analytics, Financial Analytics, Customer Analytics and so on — were typically priced at 2–3x the price of the analytics tools they ran on.

People buy solutions, not technology, as the saying goes, and packaged analytics solutions built on pre-built and integrated data warehouses were (and still are) a key part of Oracle’s appeal to the enterprise market — the market that Google had hired Thomas Kurian in help them break-into.

And if you’re a long-term reader of this blog you’ll probably be aware of Rittman Analytics’ take on packaged analytics and data warehousing, our dbt package and open-sourced warehousing toolkit called RA Warehouse for dbt.

It uses a similar design approach to Oracle BI Applications’ packaged data warehouse but focuses instead on SaaS application sources such as Hubspot, Xero and Jira, uses updated modern data stack technology such as Google BigQuery, dbt and Looker and creates a single integrated view of your business across key subject areas.

 
 

So how does Google Cloud Cortex Data Foundation compare to these packaged analytics approaches, and how can our RA Data Warehouse and other partner data sources and BI tools integrate with and extend Google’s framework?

Google Cloud Cortex Foundation, as you’d expect from an initiative designed to drive usage and consumption of Google Cloud Platform services, uses Google Cloud Platform tools and services to enable data integration, processing, analytics and reporting.

 
 
  • Google BigQuery is used as the main data warehouse to store raw replicated data, processed change data capture tables and reporting datasets generated from analytical models.

  • Google Cloud Storage buckets are used to store files generated during deployment like DAG scripts, SQL queries and temporary processing files.

  • Google Cloud Build is used to trigger deployment processes that execute steps to deploy datasets, views, models and other artifacts to BigQuery.

  • Google Cloud Composer, rather that dbt or more suprisingly, Dataform, is the data integration and orchestration tool used to transform data and schedule change data capture processing through custom DAGs

  • Google Cloud Dataflow is used as the data extraction and data pipeline tool to replicate data from Google Ads into BigQuery, for example

  • Google Secret Manager stores credentials needed to connect deployment scripts at runtime to data sources such as Salesforce

  • Identity and Access Management provisions and controlls access to resources through roles, permissions and service accounts

At this point in time, the supported data sources for Cortex Data Foundation are SAP, Salesforce and a number of marketing sources. Each data source has its own method of data extraction, for example:

  • SAP ECC or SAP S/4HANA transactional data can be streamed into BigQuery using partner tools such as SnapLogic, Boomi, Informatica Cloud etc; if real-time replication of SAP data is not required, scheduled exports from SAP can be loaded into BigQuery tables on a periodic basis with change data capture (CDC) processing then applied to surface only the latest version in separate CDC datasets

  • For Salesforce data, the deployment process sets up a connected app in Salesforce for authentication, and then raw data is then extracted from Salesforce APIs in real-time with CDC processing updating the latest records.

  • Google Ads data is integrated using the Google Cloud Dataflow runner to ingest raw campaign performance data from Google Ads APIs.

  • Google Campaign Management 360 (CM360) and other platforms such as TikTok also have replication modules that extract raw APIs/exports into BigQuery for further processing.

Some degree of data centralization and integration across these sources is provided by the framework into what’s termed the “K9” (as in DAGs…) datsets but not to the point where single deduplicated customer records, for example, are created.

 
 

But Cortex Data Foundation is part of an open framework and Fivetran, for example, can be used to replicate on-premises database sources such as SAP HANA into BigQuery using Fivetran High Volume Agent, as shown in the diagram before.

 
 

Or you can use Fivetran to do what we’re currently working on as an extension to Cloud Cortex Framework; integrating our RA Warehouse data centralization framework and Fivetran’s connectors into Cortex Data Foundation to bring in the long tail of other data sources such as Hubspot, Xero, Netsuite and Oracle Fusion HCM and CRM … and create a single view of customers, companies, products and other key business entities.

INTERESTED? FIND OUT MORE

Rittman Analytics is a boutique analytics consultancy specializing Google Cloud Platform and the modern data stack, who can centralise your data sources, optimize your marketing activity and enable your end-users and data team with best practices and a modern analytics workflow.

If you’re looking for some help and assistance getting started with Google Cloud Cortex Framework or to help build-out your analytics capabilities and data team using a modern, flexible and modular data stack, contact us now to organize a 100%-free, no-obligation call — we’d love to hear from you!

Mark Rittman

CEO of Rittman Analytics, host of the Drill to Detail Podcast, ex-product manager and twice company founder.

https://rittmananalytics.com
Previous
Previous

Rittman Analytics and Coalesce 2023, San Diego — we’ll be there!

Next
Next

Bringing dbt and Analytics Engineering to Oracle Autonomous Data Warehouse