Why companies build data applications
and how data products support data application development
In my last post, I looked at what a data product is and how treating data as a product helps organizations self-serve.
In this post, I’m going to look at a separate but very related trend: traditionally, companies have driven most value from their data in reports and dashboards. That is changing rapidly - today companies are increasingly building their own data applications to drive value from data. But what exactly is a data application? What is driving this shift to data applications? How should companies get started building data applications? In this article, I answer these questions, and also explore why data products are powerful tools to support organizations building data applications.
Image courtesy of Dall-E
What is a data application?
Ten years ago, organizations primarily used data to power dashboards and reports (e.g. business intelligence).
But with the rise of AI, digital platforms and improved connectivity, more and more companies are building data-intensive applications: applications that use data to power significant parts of their business.
Many of these applications are operational systems - they form part of the technology estate that companies use to deliver their services to their customers. There are many examples of data applications, to list just a few:
Real-time classifiers
These are real-time AI applications that make a decision, for example:
Fraud detection: algorithms that decide if this individual is trying to commit a fraud, in which case they block a transaction from completing
Bot detection: algorithms that decide if a visitor is a bot or a human, and potentially take an appropriate action. (This may include blocking a bot further access to a website, or simply labeling and filtering the data associated with the bot so it does not contaminate KPIs like ad views.)
Recommendation and personalization
It is relatively common for algorithms to power parts of an in-website or in-app experience, for example:
“People who bought this also bought this”
“Other products you might be interested in” (these might be served on a side bar along the checkout to encourage customers to buy additional items, for example)
Search results can be personalized based on preferences that an individual reveals but doesn’t explicitly state in a search (e.g. AutoTrader example).
In this case, the AI selects the “best” content or products from a long list and returns them to the user.
Decision-support systems
The above examples are data applications that make a decision (e.g. decide that this user is committing a fraud or that these specific products are likely to be of interest to this visitor). In addition, applications can also be built that use data + AI to help individuals (or group of people) make a decision.
A nice example is an apparel retailer (and Snowplow customer) that has developed a data application for its buyers, that:
Ingests the list of the SKUs they might want to stock for the next season. Note that in many cases, these SKUs are just rough sketches or concepts of items that could exist but are yet to be developed
Provides a set of recommendations as to which products should be purchased for the forthcoming season, including:
How much of that product to buy
The rationale for its decision. (E.g. “This product sold particularly well last season and is in a product class that has seen steady sales growth over the last three years. Further, last year’s profit was lower than it could have been because not enough of the product was purchased to meet demand”)
Enables users to select a particular item and then show the forecast demand data alongside historical availability, size curves and similar products
The application is then used to help the buying team decide what to purchase for the next season. The buying team makes the actual buying decisions, but these decisions are no longer just based on gut. Instead, they’re based on a thorough understanding of the past performance of similar items and the impact of seasonality.
A totally different example from the media industry has some surprising similarities.
A newspaper group (also a Snowplow customer) built an application for its editors to help them optimize the placement of stories on the homepage through the newscycle.
The application showed the homepage as it was, but annotated it with data that showed the performance of individual content items in each place, benchmarked against the previous performance of content in that same item location. (Because placement is such a significant driver of content performance.)
The data made it easy for editors to recognize content that was over or underperforming (for example overperforming if the story was getting more popular, underperforming if it was losing relevance), which made it much easier for the editors to confidently rearrange the homepage frequently over the next cycle.
Like the buyers at the apparel retailer, the editors at the newspaper group had an application that was built to meet their needs and surfaced the data in the most helpful way to empower them to make a very specific but critical set of decisions on an ongoing basis.
Dashboards and reports
Old fashioned dashboards and reports can also be considered data applications. Many dashboards are specifically designed to help users make certain decisions (e.g. optimizing the deployment of money across different campaigns by highlighting campaigns that are over or underperforming in terms of return on investment) or to monitor the health of one or more business processes or technical systems.
Unlike the other data applications discussed above, it does not typically take a product engineering unit to ship a dashboard or report: a BI analyst can often do this in a business intelligence tool. However, the distinction between the two is blurring, as organizations look to deliver reports and dashboards into broader applications that are developed for customers or team members (look at how BI vendors all support embedded delivery of dashboards).
Generative AI-powered applications
The emergence of generative AI has dramatically increased the scope of what data applications can do.
While the focus of most data applications prior to the development of generative AI was on making decisions or supporting humans in making decisions, the technology has the potential to grow the scope of decisions that a data application can support, or make itself. For example:
Rather than simply optimizing which ad creative from a set of assets to use in an advertising campaign, a generative AI-powered data application might generate new creatives, that are better performing than any of the human designed examples, driving improved return on ad spend by growing the number of possibilities that the optimization is performed across
For the apparel buyers mentioned above, generative AI could be used to suggest new products that would do well that the team might otherwise have missed.
How do companies organize to develop data applications successfully?
Companies that want to derive value from data are increasingly looking beyond business intelligence and forming teams capable of building, shipping and supporting data applications.
These teams look like classic product engineering teams, but typically have additional skills in data engineering, data science, MLOps and LLMOps, alongside the more typical skills like front and backend engineering.
This was a deliberate strategy employed by a fashion brand and Snowplow customer. Its Data & Analytics team spun up a number of product engineering units to develop and roll out key data applications across the business including:
An application that supports buyers with their quarterly planning (described above)
An application that helps the team figure out where and when to set up pop-up stores This is a very complicated thing to measure - it is not trivial for example to understand how many sales generated by the pop-up store cannibalize the permanent outlets / channels
A content ranking engine to help the social team put out posts that are more likely to go viral
A media optimization app that combines the media mix model with multi-touch attribution to help the marketing team allocate budget across different channels.
The key thing for the organization when implementing this strategy was that its data science team wasn’t just composed of data scientists, but it had access to all the required people (product managers, designers, front end and back end engineers) to design and build applications with UIs and algorithms that meet the needs of the different teams in the business.
These teams were given 6-12 months to launch an app and measure its impact (i.e. the return on investment), and then if the application was shown to add value, they had the opportunity to iterate on these applications to drive higher returns.
Data products enable data applications
Organizations that invest heavily in data applications can realize significant efficiency gains from investing in a well-governed underlying data set and associated technology so that product engineering teams have great source materials (data + supporting tooling) for developing data applications.
A big challenge in developing a data application is to ensure that it’s built on a solid underlying data set that:
Provides all the information required. For example, if a data application uses machine learning, the data set should include plenty of highly predictive features
Is of high quality (i.e. the data is accurate and complete). Otherwise, dashboards will give a false picture of reality, decision support systems will lead people to make the wrong decisions. In addition, automated classifiers and recommendation engines will misclassify or recommend inappropriate items
Provide the data continuously with the required latency, schema, encoding, etc.
A well-governed, well-documented set of data products, as per my prior post, provides an excellent foundation for building data applications. The difference between data products and data apps is not always well understood - let’s share a typical workflow that highlights the difference:
To create a data application, product engineering units do not have to start at ground zero. They can build on one or more data products that meet their needs
They may find an existing data product that meets their needs in the data catalog (which has to be kept up-to-date and accurate)
If not, they can create a new data product - either from scratch (e.g. by sourcing new data) or from other existing data products. By publishing this data as its own data product, future teams that need a similar data set can build from the same basis, rather than having to start from scratch themselves
As part of the data product, it should be clear what SLAs the data adheres to, and the application developers have confidence those SLAs will be maintained going forwards
The product engineering team can engage with a data product manager, who liaises with the teams that produce the data. The data product manager explains to the data producers who’s using the data for what, so they can take this into account both when delivering the data on an ongoing basis and managing the roadmap for it.
So, how should organizations get started building data apps and data products?
Here are some guiding principles for getting started with building data apps and data products:
Step #1
Organizations should first build data applications before investing in data products.
This might sound counterintuitive: after all, it is much easier to build a new data application if you already have one or more data products to build on.
However, it is always best to start with the business value you want to create - i.e. the data application you believe is going to move the needle for your business, and then work backwards to the data product(s) that are necessary to support this data application. If they don’t exist yet, you now have a business case to support building them.
Step #2
Make sure that the potential value to be generated is big. Building any kind of application, let alone a data application, is a significant endeavor. So make sure you pick a use case where there’s real value to be realized.
Step #3
Assemble the right team to seize this opportunity. You need the right combination of product management, data engineering and front- and back-end engineering professionals.
Step #4
Give the team the space to execute. Like the brand I mentioned earlier that gave their teams 6-12 months to make a difference.
Step #5
Publish any data products developed as part of the data application. It is likely that they will be valuable to other teams. The team building your first data application should set a good example in terms of creating the underlying data products and socializing them across the business.