In today’s data-driven world, organizations across industries are recognizing the need for robust data management practices to make informed decisions and gain a competitive edge. Data observability has emerged as a critical capability in ensuring the reliability, accuracy, and trustworthiness of data, empowering businesses to make confident choices based on high-quality information.
Data observability can be defined as the practice of monitoring and ensuring the quality, availability, and reliability of data across the entire data lifecycle. It involves the implementation of tools and processes that allow organizations to gain visibility into their data pipelines, identify and rectify issues promptly, and maintain data integrity. By embracing data observability, businesses can proactively detect anomalies, track data lineage, and mitigate the risks associated with poor data quality, inconsistencies, and inaccuracies. They can also use it to escalate and remediate data outages rapidly within expectable SLAs. Leading tools provide continuous multilayer signal collection, consolidation and analysis to inform and recommend better design for superior performance and better governance to match business goals.
According to Gartner, four main areas of observability are observing data assets, data pipelines, data infrastructure and data users.
What are some of the Key Capabilities?
Data Quality Monitoring: Data observability tools enable organizations to monitor the quality of incoming and outgoing data streams in real-time. By setting up custom data quality rules, businesses can identify data anomalies, missing values, and inconsistencies, ensuring data accuracy and reliability. This capability helps organizations avoid costly errors and make data-driven decisions with confidence. What’s the difference between traditional Data quality (DQ) vs Data observability? DQ aims to ensure more accurate, more reliable data while data observability ensures the quality and reliability of the entire data delivery system.
According to Gartner, “To foster personalized consumption of insights and data stories, data and analytics leaders must pivot to a composable platform featuring autonomous analytics, metadata, governance and data quality.”
Data Lineage and Impact Analysis: Data observability platforms provide comprehensive visibility into the end-to-end data flow, including data sources, transformations, and destinations. This capability allows businesses to understand the lineage of their data, track changes, and perform impact analysis to identify potential bottlenecks or issues. Data lineage empowers organizations to enhance data governance, comply with regulatory requirements, and build trust with stakeholders.
Forrester states, “Data lineage is crucial for understanding data’s origins and destinations, assessing data’s fitness for use, and governing it effectively.”
Anomaly Detection and Alerting: By leveraging advanced analytics and machine learning algorithms, data observability tools can automatically detect anomalies and deviations from expected data patterns. Organizations can set up real-time alerts and notifications to proactively address data quality issues, ensuring timely action and reducing the risk of making decisions based on inaccurate or incomplete information.
IDC notes, “Data observability solutions using artificial intelligence and machine learning techniques are gaining traction, enabling organizations to automatically detect and address data quality issues.”
With any new “hot” category many companies will stake their claim to be a provider of data observability solutions. Gartner Sr Principal Analyst Ankush Jain in his research note What is Data Observability? (June 2022) calls out a list (in alphabetical order) to include Acceldata , Bigeye, Collibra, Databand, Datafold, DataKitchen, Kensu, Monte Carlo, Sifflet, Soda and Validio. However because not all observability platforms are data observability focused, let’s take a closer look at what I believe to be the main vendors in this emerging market.
Who are the Pure Play Vendors leading the Market?
- Acceldata who coined the term Data observability in 2018, offers a comprehensive data observability platform with strong capabilities in data quality monitoring, data lineage, and anomaly detection. Their solution focuses on providing end-to-end visibility into complex data pipelines, enabling organizations to ensure reliable data which accelerates data-driven decision-making. While already packed with features, Acceldata recently announced even more capabilities including Spend Intelligence to their platform, with an example of how they can save Snowflake customers money. This is a particularly differentiated perspective, particularly in these economic times where CIOs and especially CFOs are looking to cut costs, while still improving their data to run their business. Acceldata’s leadership team hails from Hortonworks (who merged with Cloudera), and therefore also uniquely offers solutions for the Hadoop (HDP) market
- Bigeye focuses on simplifying data observability with an intuitive and user-friendly interface. Their platform offers automated data quality monitoring, data profiling, and anomaly detection, allowing organizations to identify and resolve data issues quickly. Like the other players, Bigeye also provides data lineage visualization to enhance data understanding, and facilitate troubleshooting. Bigeye’s founders worked on the data pipelines for Uber’s in-house A/B testing tool—to report out standardized metrics for the company’s 1000s of experiments—where data quality was becoming a considerable challenge
- Monte Carlo is known for its data observability solution that focuses on data quality monitoring and anomaly detection emphasizing data reliability. Their platform leverages machine learning algorithms to proactively identify data anomalies, enabling organizations to take timely corrective actions. Monte Carlo also emphasizes collaboration and offers integrations with popular data tools and platforms, most recently announcing a partnership with Fivetran. Monte Carlo’s founder worked with Fortune 500 companies as VP of Customer Operations at Gainsight helping teams use the Gainsight data as a competitive advantage
- Telmai specializes in data observability and data governance solutions, emphasizing the importance of data lineage and impact analysis. Their platform enables organizations to understand data flows, maintain data integrity, and comply with regulatory requirements. Telmai’s solution supports any source with an open architecture (Batch, Streaming, API). Telmai’s unique no code on-boarding, connects to data sources and specific alerting channels. Telmai’s name includes the key word “AI” denoting their foundational use of AI/ML in their platform. Telmai will automatically learn from its data observations and alert you when there are unexpected drifts. The co-founders came from Reltio, the leading Cloud Master Data Management platform
How Big is the Market, Maturity and Venture Capital Investment?
The data observability space has gained significant traction in recent years, reflecting the growing recognition of its importance in data-driven organizations. While there isn’t a specific forecast for the data observability market as yet, according to Gartner the overall observability market encompasses a wide range of categories such as Application Performance Monitoring (APM), will become a $8.9 billion market by 2026, with a 8.6% compound annual growth rate (CAGR) between 2020 and 2026, as cited in the Gartner Magic Quadrant for Application Performance Monitoring and Observability (June 2022). Meanwhile the more established data quality tools market size had reached $2 billion in 2022 according to the Gartner Magic Quadrant for Data Quality Solutions (Nov 2022). Earlier I pointed out that Data quality is not the same as Data observability, but it’s a trending market, so vendors who have DQ offerings like Informatica and Collibraare cleverly joining the party. Data observability is complimentary, and I expect to see more partnerships or perhaps acquisitions, announced in these related markets.
No wonder venture capitalists recognizing the potential of data observability, have made substantial investments in this space. For instance, Acceldata recently secured their $50 million in a Series C funding round (Feb 2023) from March Capital, with additional investment from Sanabil Investments, Industry Ventures, and existing investors, Insight Partners bringing their amount raised to $95.6M to date. Meanwhile Monte Carlo raised an eye popping $135M Series D (May 2022), bringing their funding in 20 months to $236M at a $1.6B valuation. Their investors include Accel, GGV Capital, Redpoint Ventures, ICONIQ Growth, Salesforce Ventures, and GIC Singapore. Telmai while relatively new, is a YCombinator company with seed funding of $2.8M (Aug 2021) that also included .406 Ventures and Zetta Venture Partners. Notably despite being relatively new Telmai has already struck a partnership with Google. Bigeye’s also impressive list of investors include Coatue, Sequoia Capital and Costanoa Ventures in their Series B of $45M (Sept 2021), bringing their total raised to $66M. Last year Crunchbase highlighted the significant amount of investment being made by VCs in all things Observability, commenting “in the span of one week three companies alone raised more than $400 million. This shows the significance of a sector that helps companies review, evaluate, index and, in general, control what has become the lifeblood of so many large enterprises—their data.”
By my quick calculations Acceldata ($95.6M), Bigeye ($66M), Collibra ($596.5 – but mainly data governance focused), Databand ($14.5M), Datafold ($22.2M), DataKitchen (NA), Kensu ($4.9M), Monte Carlo ($236M), Sifflet ($15.8M), Soda (EU 14M), Telmai ($2.8M) and Validio (NA), totals over $500M invested in the space to date, even with Collibra excluded.
Data observability is still observed to be fairly new as judged by the 2022 placement in the Gartner Hypecycle for Data Management (diagram below courtesy of https://www.denodo.com/en/document/analyst-report/2022-gartner-hype-cycle-data-management) just beginning the climb up the Innovation Trigger phase.
Which Customer Companies are Early Adopters?
As with any new market, early adopters help cross the chasm and provide the justification for VC investments and other entrants into the space.
Acceldata lists Dun & Bradstreet, Oracle and Pubmatic among their customers, Bigeye highlights Instacart, Udacity and Docker. Monte Carlo, at this time of writing has a sizable list of 150+ customers which they categorize by industry use case, including PagerDuty, Roche, SoFi and more. Telmai features Clearbit and DataStax among their published detailed case studies.
How is ROI Measured?
As with any new technology a solid business case needs to be made for yet another set of tools and platforms. Fortunately many companies who have taken the early plunge into data observability have seen ROI that has more than justified their investment.
- Acceldata’s customer PhonePE, a subsidiary of Walmart, used Acceldata Pulse – the solution for Hadoop to scale their data infrastructure rapidly from 70 to more than 1500 Hadoop nodes a more than 2000% growth. Delivered 99.97% availability across its Hadoop infrastructure, and reduce data warehouse costs by 65% by eliminating the need for commercial licenses
- Bigeye customer Udacity monitors 1.6M+ data points in their data pipelinesresulting in 60% reduction in issue detection time
- Monte Carlo’s customer Prefect cites a 50% engineering time saved in triage and resolution of data quality issues and 16X faster implementation time compared to building an in-house solution
- Telmai customer Clearbit gets data quality and freshness of 50M company records, 389M contact records, and 4.5B IP addresses coming from over 250 sources. Clearbit can now significantly reduce time to detect and time to resolve data quality problems, which also allows Clearbit to show their customers that consume their data, the accuracy and freshness of data, which helps their sales and marketing efforts
What Next?
We are in the early innings, but data observability is fast becoming a vital capability for organizations seeking to unlock the true value of their data assets. By implementing robust data observability solutions like those offered by Acceldata, Bigeye, Monte Carlo, Telmai and others, businesses can ensure the reliability, accuracy, and trustworthiness of their data. With features such as data quality monitoring, data lineage, and anomaly detection, organizations can proactively identify and address data issues, enabling data-driven decision-making, compliance with regulatory requirements, and improved overall business performance. As the market continues to grow, driven by increasing data complexity and the need for reliable insights, data observability will remain a critical investment for organizations looking to harness the full potential of their data.
Has your company looked into or acquired data observability tools yet? Please share your thoughts on this exciting new space.