The rise of the modern data stack has revolutionized how businesses run around the world. Many companies now rely heavily on data analytics and machine learning to gain critical insights and to build better products. This has led to an explosion of tools and platforms that help these organizations gather, store, and process an increasing amount of data. However, as data teams grow larger and the data ecosystem becomes more complex, new challenges have emerged that prevent the successful adoption of analytics, data science, and machine learning. These challenges include issues related to discoverability, interpretability, observability, and governance.
We know about these challenges because we lived through them at LinkedIn. Despite having amassed valuable data and implemented sophisticated data management systems, our data scientists and ML engineers often struggled to leverage data effectively. At the same time, there was an increasing need for stronger data governance to protect user privacy. While trying to solve these problems, we realized that there was a common solution to both this and other problems — metadata.
Over the past few years at LinkedIn, Mars, Seyi, and I — along with an incredibly talented metadata team — created DataHub, a third generation, fully-featured platform to serve as the metadata backbone for the company. Today, DataHub powers numerous mission-critical use cases at LinkedIn, including search & discovery, data lineage, data privacy, data governance, and AI DevOps, among others. Each day, it is used by and integrated with 40+ teams/projects; indexes 4 million datasets, metrics and dashboards; and processes 10 million entity and relationship events.
By implementing DataHub, LinkedIn’s data became actionable, and LinkedIn data users became more efficient and impactful. We’ve proven that a trusted knowledge graph of metadata is the key to unlocking the value of data, unleashing practitioner productivity and solving many additional business-critical data challenges.
Earlier this year, LinkedIn open sourced DataHub, building on the company’s long-standing legacy of contributing meaningfully to the software community. The project was well-received and quickly gained adoption at companies like Expedia, Saxo Bank, SpotHero and dozens more, who rely on DataHub as the foundation of their data infrastructure. Through knowledge sharing sessions, town halls, Slack engagement, and GitHub issues, this vibrant community has helped us shape the open source roadmap and validate the widespread need for a well-designed data catalog and metadata management system.
While this experience at LinkedIn has been incredibly rewarding, we want every company to realize the promise of data unlocked through effective metadata management. We can achieve this most effectively by building upon DataHub , and expanding the mission to build powerful metadata management tools that we embarked on years ago.
Today, we are excited to announce that Mars, Seyi, and I have teamed up to start Metaphor Data, a company dedicated to building out the DataHub ecosystem. Our mission at Metaphor Data is to help all organizations better understand and manage their data through the power of the metadata knowledge graph.
In service of this mission, we’ve raised a $5.3 million seed round led by Amplify Partners and Andreessen Horowitz, two venture capital firms that have a long, storied history supporting category-defining data infrastructure and analytics companies like Databricks, Fishtown Analytics and Fivetran. Additionally, we are fortunate to have a group of luminaries from the worlds of data science and data engineering joining this round as angel investors, including thought leaders like Bob Muglia, DJ Patil, Hilary Mason, Josh Wills, Neha Narkhede, Scott Breitenother.
Finally, we’d be remiss if we didn’t mention how grateful we are for the support of the LinkedIn leadership team. The culture of Next Play has encouraged and celebrated employees to pursue their dreams and build tools that can reshape how data is used and consumed, and we plan to do just that. We’d like to also thank the amazing LinkedIn metadata team for making this project possible.
If you are interested in learning more or want to share your (meta)data experience, please visit us at metaphor.io