The Future of Data Engineering

November 6, 2021
The Future of Data Engineering

As the world digs itself out of the blizzard of changes brought on by the COVID-19 pandemic, data engineering is emerging as an ever more critical component of the corporate and IT infrastructure. However, the industry is undergoing a shift, and the pandemic accelerated processes that were already developing. The prognosis for data engineers is bright, but viewing five years out, the discipline might look quite different. At Proxet, where assisting companies with digital transformation and AI/ML projects is our everyday life, being part of that change is exciting, and it leaves us asking, “What’s next?”.

In broad terms, the digital engineering services market is expected to grow strongly over the next five years. Research and Markets predicted in July 2021 that the data engineering services market is expected to grow at a CAGR of 16.3% over 2021-2026. And that’s not including parts of IT departments that are overhauled as companies digitize and automate.

Moreover, the World Economic Forum stated in its Future of Jobs 2020 report that 94% of employers expected to see the increased digitization of jobs, and that half expected the further automation of jobs. Both trends point to the increased generation and use of data. And to do that, data engineers will be needed.

The Rise of Data Engineering

Contemporary data engineering can trace itself back to the rise of non-relational data bases and big data in the early 2000’s. The beginnings of today’s data engineer stack really came into play with the publication of the Google File System paper in 2003, which paved the way for both Google MapReduce and Hadoop. Hadoop and MapReduce were essential data engineering tools  for over a decade, though by August 2019, Cloudera VP for Solutions Engineering Tom McCuch was openly responding to then-recent questions on LinkedIn regarding whether Hadoop was still relevant.

The next big change to the data engineering pipeline came a few months later with the rise of the COVID-19 pandemic. Though jarring, the pandemic wasn’t a black swan; a pandemic of some sort had been expected by governments for years. But big data was to play an outsized role in understand both the plague and the humans carrying it, and data engineering best practices also took on a new meaning. Data became politicized, as in the case of Florida data scientist Rebekah Jones, who holds that she was fired by the State of Florida for refusing to manipulate Florida’s coronavirus statistics on politicians’ request.

Within data engineering, however, discoveries of a different kind were being uncovered. Gartner wrote in its Gartner Top 10 Data Trends for 2021 article that the pandemic forced organizations to reassess their data and data analysis framework. And when it came to their models and historical data, everything was out of date. The ramifications were not lost on data scientists and engineers.

“It’s hard to get good data about the future, so we have to use data from the past. And if the past is no longer a guide to the future, we’re going to have a tough time doing any sort of predictive analytics.”

Thomas Davenport, PhDFellow at the MIT Initiative on the Digital Economy

Instead of “big data”, Gartner sees future data engineering best practices as falling away from big data into small and wide data, and calls it one of its top data and analytics trends for 2021.

The Latest in Data Engineering

Gartner’s top-10 trends match what other industry observers see as the future of data engineering, as a whole, though many would hold that big data is here to stay. These trends fall under three broad categories, namely accelerating change, operationalizing business value, and distributed everything.

Underneath these trends, though, is the constant effort to find a compromise between data warehouse-centric and enterprise-centric models of data handling. The unsung hero of data engineering, then, is the data infrastructure engineer who balances the need to protect environments such as data lakes, while ensuring that seemingly ever-more data-hungry workers stay fed. As individual data pools become smaller and even more diverse, creating the right data engineering process has become ever more important.

Data Engineering Trends 2021

Gartner’s Top-10 Data and Analytics Trends in 2021 is just one look at the trends.

  • Smarter, responsible, scalable AI
  • Composable data and analytics
  • Data fabric is the foundation
  • From Big to Small and Wide Data
  • XOps
  • Engineering Decision Intelligence
  • D&A as a core business function
  • Graph relates everything
  • The rise of the augmented consumer
  • D&A at the Edge

Mihail Eric’s January 2021 post “We need Data Engineers, not Data Scientists” pointed to the other main issue of data engineering in 2021: what about the practitioners? The post, which gained over 75,000 views on Hacker News, made the point that there were 70% more open jobs in data for data engineers than for data scientists. This is despite, or perhaps because of, the prominence of the data scientist position in the public eye after the Harvard Business Review dubbed it in 2012, “The Sexiest Job of the 21st Century”.

IBM also published in July 2021 its view of the future for three roles that focus on data, namely the data scientist, data engineer, and data analyst. IBM’s view of the data scientist was that the role involved more in-depth analysis and the creation of models as well. Data engineers are “managing data throughout its life cycle”. While IBM did not mention the data infrastructure engineer role directly, the job description, which included “designing, building, and maintaining data infrastructures”, certainly encompassed it.

However, a report by Deloitte on jobs in government in the year 2025 focused on data engineers. Deloitte sees them as spending up to 50% of their time on “data science development”. In short, the lines between data engineering and data science are blurring. An article in The New Stack points out that the blurring of these roles also has to do with the underlying merging of data infrastructure and data science.

In an article for InfoQ in February 2021, Chris Riccomini also put a finger on what data engineering is not, and what it is not going to become any time soon. It may seem obvious to those living and breathing data, but for somebody’s pointy-headed boss, it needs to said:

“You don't as a data engineer want to be entering that data yourself. You do not want to return to the land of manual data stewards and data management.”

Chris Riccomini, Distinguished Software Engineer, WePay

Face it, if you’re spending more time inputting data than worrying about etl data engineering, you’re a data entry keyer, and the WEF Future of Jobs 2020 report says that those jobs are disappearing. In fact, data entry keying as a job shrank by 40% between 2017 and 2018 alone in the U.S. Even etl data engineering is changing, as companies such as Snowflake essentially let data engineers outsource parts of the process in the cloud. At Proxet, we’ve been watching how digitization transformed companies, and in the process, transformed “Data Practitioners”.

“The evolution of data engineering has accelerated with the increasing power of the tools and storage available. The range of companies data engineers work with has grown accordingly, as whole industries digitize, and then in turn, as whole industries transform IT departments to fit their new-found needs. And with that comes revolutions in data acquisition, cleaning and processing. Companies will need to invest in the capacity to deal with data themselves, or find reliable partners to focus on building the data infrastructure for them.”

Vlad Medvedovsky, CEO at Proxet (ex - Rails Reactor), a company providing software development services

The Future of Data

“Data is the world’s most valuable resource!” says IBM. And while Big Data was and still is a big deal, Gartner points to the fact that small and wide data are becoming more important. So is making use of the data.

Seagate and IDC predicted “The Digitization of the World” in 2018. The amount of data is doubling every three years at current rates, and is expected to be at 175 Zettabytes by 2025. The variety of data is increasing as well. Seagate and IDC claimed that by 2025, 75% of the world’s population, or 6 billion consumers, will be interacting with data every day. As they point out, the data infrastructure to handle growing numbers of clients is going to be a customer experience differentiator.

Engineering future change, from small businesses to the largest enterprises, requires data and data infrastructure more than ever. At Proxet, data engineering is a large part of our day. From traditional etl data engineering to setting up a bespoke data engineering stack, our experience with data from the smallest heartbeats to ecommerce can help your company navigate the post-COVID environment.

Related Posts