2023 saw remarkable advances in the data world. I attended AWS re:invent this year, and it left me feeling that if you weren’t talking about “Generative” something, LLMs or AI, were you really doing anything at all? Obviously, there was a lot of other great work going on, and every hype cycle brings some kind of similar explosion of activity, but 2023 was most certainly the year of GenAI.
While we’ve seen a lot of rough edges and found challenges in working with GenAI, and Gartner predicts that the hype has peaked, there are still many amazing things happening, and the practical uses of AI and LLMs are maturing and growing. What follows are what I think are the four biggest trends happening right now in the world of data management and data analysis that we will be pushing to take as far as possible in the new year.
Data Accessibility
2024 will be the year where Large Language Models (LLMs) and Large Multi-modal Models (LMMs) go from fascinating experiments and buzz-word-fueled hype to deep integration with the tools we use daily. This is big news for those working with data because data has always been complex. Tools like OpenAI’s Advanced Data Analysis and Microsoft’s demos of CoPilot in Excel have teased a world where data tools become available to the masses.
While these tools have shown incredible potential, their practical uses haven’t always met expectations when facing bigger or more complex real-world data. Context limits of large language models make these tools ideally suited to relatively small datasets. However, the potential for LLMs to help regular people do more with their data is now clear, and the world has taken note. The continued development of core LLMs, various techniques such as function calling (enabling LLMs to use external tools), and Retrieval Augmented Generation (RAG) make a broader set of use cases possible. These building blocks will continue to mature in 2024 in fascinating ways, allowing many more people to do things with data they could only dream about before.
Data Mesh
For decades, there has been a push toward centralizing data. Centralization has a lot of advantages, as having data all in one place means you can discover insights and build more straightforward predictive models. The problem is that centralization as an organizational effort always adds complexity and friction, the best-laid plans end almost always in partial success, and a lot of data silos still spread throughout the organization.
Some centralization is still valuable and encouraged. However, in recent years, the broader industry trend has shifted towards the concept of a Data Mesh. Here, data ownership, governance, and tools remain in the hands of the departments or teams that produce and work most specifically with the data. The key to a great data mesh is interoperability. For example, the sales team’s sales data can integrate with data kept by the customer success team, which can further combine with data held by the e-commerce team. All this data is shared as “data products,” which can be consumed by other teams and integrated with their data. All of this allows everyone to move faster and share data more freely.
Data Composability
While Data Mesh focuses on organizational flexibility, data platform composability is the focus of many of the greatest minds in the data engineering world. Composability means that the tools that make up our data platforms can all talk to one another. Wes McKinney is the creator of some of the most critical tools in the data science world, like Pandas for doing data analysis in Python and the Arrow data format, which has become the standard way to store data for pretty much any new data tool that has come out in recent years. Wes wrote in September about the path to composable data systems and contributed to The Composable Data Management System Manifesto. These pieces are aimed at technical readers focused on complex data platforms. However, the heart of the issue for both composability and data mesh is that we, as organizations, naturally spread data across platforms, departments, clouds, and more, which is an inevitable feature of the complexity of data and organizations. Composable systems help address this by allowing data to be stored, queried, processed, and analyzed across a wide range of systems using open standards.
Data Activation
Accessibility, Mesh, and Composability all point to one major trend: Data gathering virtual dust for all these years can finally be activated. Small to mid-sized businesses that haven’t historically had access to resources like big tech's data engineering and data science teams will gain access to the capabilities that will give them an edge in their niche markets. Lines of business at large enterprises without access to the organization's central data warehouses and data teams will benefit similarly, finally able to activate and utilize data locked up in their spreadsheets, SaaS platforms, siloed databases, and more.
According to Harvard Business Review’s 2023 survey of Fortune 1000 companies, “under one-quarter of executives reported that their companies have created a data-driven organization, down from 31% four years ago.” It seems safe to assume that a much smaller percentage of smaller organizations are genuinely data-driven, which is to say that most of us are making decisions without a clear understanding of the data most of the time.
So, in 2024, let’s all watch for more ways to activate our data and make better decisions.
Peter Drucker (and possibly Abraham Lincoln) said that the best way to predict the future is to create it. Here at Querri, we’re pouring everything we have into making 2024 the year that these trends unlock the enormous potential of dormant data for our customers. If you’d like to accelerate your organization’s adoption of any of these trends, give us a shout; we’d love to chat.