Everything is data-driven these days, especially with the growing prominence of big data. As we progress over time, we also receive more and more data that we can use to help make better decisions.
Here comes an issue, however — how exactly can we utilize all the data we have? Different formats from different time periods … how can we consolidate everything to build a flexible system that gives us valuable insights?
Luckily we have data lakes. And yes, it is also being used in healthcare to potentially save more lives.
What is Data Lake?
The main difference lies in how everything is being organized.
First of all, what’s the definition of data lakes? How is it different from, say, database and data warehouses?
Database is perhaps one of the most basic ways to manage data. Simply put, databases are an organized structure of data. A simple example would be a telephone repository that contains the name, number, and perhaps address of a person. Or, as per the example below, information in a sale system.
Popular with medium to large-sized companies, data warehouses are large storage locations for data from multiple sources. The primary use of data warehouses is business intelligence (BI), such as analytics, which can help corporations make informed decisions. By design, data warehouses are also filtered and more structured.
Take the data from various sources, organize and structure them, then use them to generate reports and analytics — you get the rough idea.
The concept of data lakes is, well, like a lake of raw, structured, and unstructured data, all from different sources and in a lake where you can fish them out. This might not be the most detailed description, but you get the rough idea.
The existence of raw and unprocessed data also means flexibility — data scientists can build models and run any analytics their heart desires in the future based on the data. With that being said, it is also much more complex and requires a much deeper understanding of data science.
With the meaning of data lakes out of the way, let’s look at how data lakes are used in healthcare and help save lives, shall we?
Types of Healthcare Data and Clinical Data Architecture
First of all, let’s look at the types of healthcare data being utilized.
Generally speaking, healthcare data includes medical records, device-generated data (think of all the health monitors), administrative datasets, and healthcare surveys. Of course, as we mentioned before, data in data lakes are often raw and unprocessed, which means all of these can stay in their purest form — unstructured, and possibly at the same level without hierarchy.
However, the chart above did illustrate how these data and clinical data architecture, when utilized correctly, can benefit the patients and help improve the healthcare system.
Healthcare Data Warehouse vs Data Lake
We should also take a moment and look at data warehousing in healthcare and how it works, which can help us understand why data lakes can be superior in certain cases.
Again, as we mentioned above, data warehousing utilizes structured data, which can often produce more precise results but lack flexibility. With a properly designed data warehouse, hospitals and healthcare providers can create analytics and projections based on historical data, thus creating more accurate diagnostics.
For example, it would be easier to look at a patient’s past medical history and produce a medical profile with data warehousing, which can then be used to create predictive and prescriptive analyses.
Data warehousing does have its shortcomings, however.
Data Lakes in Healthcare
One major obstacle to using data warehousing in healthcare is the existence of large and unstructured data in the industry. Imagine all the handwritten notes from the doctors and years of files stored in the archive — it would be nearly impossible to categorize everything.
This is where the benefits of data lakes in healthcare truly shine through — it omits the step where everything has to be processed and categorized. Instead, they can be taken as is, and the computer, through the different machine learning algorithms, can do the job for us and process them whenever we need to utilize the data, in whatever ways we want.
"The rapid inclusion of new data sets would never be possible in a traditional data warehouse, with its data model–specific structures and its constraints on adding new sources or targets."
— Kelle O'Neal, founder and CEO of management consulting firm First San Francisco Partners.
The use of unstructured data can also help us discover previously unknown relations and correlations. With data warehousing, the data used are predetermined by the analysis and form of results required. But with data lakes, machines can be trained to look at the different raw data and discover patterns not seen previously.
In other words, the benefit of data lakes is that offer a dynamic means to analyze different raw data, instead of having to build a system with a predetermined form of analysis.
"The potential of utilizing data lakes in healthcare is enormous. It allows us to see things we never noticed before, and that has a huge implication on how medical diagnosis can be made in the future.— George Serebrennikov, COO at Proxet (ex - Rails Reactor) – a custom software development services company.
Now, let’s take a look at real-world examples.
Data Lake Use Cases
Phoenix Children's Data Lake was created by Phoenix Children's Hospital in 2015 to facilitate the process and create safer solutions for patients.
"We pulled data from 40 systems — everything from the surgery system to the scanned medical records to the general ledger and payroll. Anyplace we could tap data, we did."
— David Higginson, executive vice president and chief administrative officer at Phoenix Children’s.
According to an article published by U.S. News, there are three main criteria as they designed the system:
- Data systems must collect and manage large quantities of information while increasing efficiency for providers.
- Data systems must also update in real-time to provide clinicians with the most current information possible on their patients.
- Each system must be fully integrated into the processes of the hospital, and provide data in an actionable format.
And here are the different ways the data helped provide better diagnoses.
Usage Examples of Data Lakes
Determining the correct dosage
In the case of Phoenix Children's Data Lake, they were able to utilize the system to help administer the dosage of medication in pediatric care.
According to them, the dosage is determined by the patient’s weight alongside an array of other factors, including their health history and the current intake of other medications, which can complicate the calculation — and if done incorrectly, the results can be fatal.
But with the system they designed, they were able to further develop their Pediatric Dose Range Checking System, using patient data to determine the recommended medication dose ranges for children.
Preventing acute kidney injury
On the other hand,
target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener">acute kidney injury (AKI) is also a major concern in administering medications — for patients who need to take numerous medications, maintaining the balance of different medications while administering the most effective dosage can be tricky.
However, with Phoenix Children's Data Lake, the hospital staff were able to analyze millions of patient records to identify patients at elevated risk of AKI, thus allowing the staff to take proactive measures.
Deciding on the best plan of care
Montefiore Health System is another company that utilizes data lakes to help save lives. They launched an AI platform called the Patient-Centered Analytical Learning Machine (PALM) that incorporates a semantic data lake architecture.
[PALM] is designed to enable healthcare systems like us to adopt AI and machine learning at a really large scale and in every corner of their operations without rebuilding their entire infrastructure and bringing an entire line of new technology into their enterprise. PALM is able to provide doctors and nurses with insight in the form of customized order sets proposed through best practice advisories. According to an article by Health Tech Magazine, it can also “help doctors to invoke preventive care, such as expedited ordering of sepsis panels and protocols or transfers to the ICU, or start end of life palliative care, or the provisioning of resources for mechanical ventilation in timely manner."
— Dr. Parsa Mirhaji, director of the Center for Health Data Innovations at Montefiore Health System.
Who would’ve thought machines could help save lives?
The Growing Data Lake Market
As businesses develop, the need for accurate and dynamic business analysis also arises, and data lakes provide just the right solution to the growing demand for business analysis — and not just in healthcare. Because of that, the global data lake market has seen significant growth in recent years and is expected to grow at a rapid rate.
target="_blank" aria-label="undefined (opens in a new tab)" rel="noreferrer noopener">research carried out by Grand View Research, the global data lake market is valued at USD 7.6 billion in 2019, and it is projected to grow to USD 20.1 billion by 2024, at a compound annual growth rate (CAGR) of 20.6%.
While we focused on its application within the healthcare sector in this article, the biggest segment that utilizes the technology is still the IT sector itself, followed by banking, financial services and insurance (BFSI), then retail, the latter of which is expected to grow significantly in the coming years as data lakes could assist in targeting the right customers.
Here are also some key players in the global data lake market that are currently providing data lake solutions and fostering the development of the data lake technology.
- Amazon Web Services, Inc
- Cloudera, Inc.
- Dremio Corporation
- Informatica Corporation
- Microsoft Corporation
- Oracle Corporation
- SAS Institute Inc.
- Snowflake Inc.
- Teradata Corporation
- Zaloni, Inc.
According to the forecast, the projected revenue of the global data lake market is estimated to be USD 31.5 billion. Meanwhile, the North American market is expected to hold the largest market share, with the Asia Pacific region expected to see the highest growth rate, supported by major technology companies in China, India, Australia, and Japan.
Building a Professional Data Lake
Depending on your needs, utilizing data lakes can be a good long-term solution — however, that might also come with various challenges and require assistance from experts. We merely touched on the surface of data lakes in this article, as the actual implementation is much more complex.
With that being said, we can see the growing adaption of the data lakes across different sectors — not just limited to healthcare — as the need for data analysis arises. Nowadays, having the right product is only part of the equation — it is also about how to bring the products to the right people, and data lakes can do just that in the modern business environment.
Making the right business decisions can also be a challenging task, but with the right data lake solution, it is possible to look at historical data and create different scenarios and projections to determine the best course of action — in fact, enterprise data lake consulting is one of the things we at Proxet specialize in.
Proxet is a trusted leader in providing software development services with years of expertise in data science and machine learning. Should you find yourself in need of assistance in building a trusted and reliable platform, we are here to help.
Learn a step-by-step framework for constructing an optimal modern data stack — hear Proxet's CTO cover crucial elements like build vs buy choices, open source tools, typical mistakes, and how we can assist.
Build a modern data stack by following best practices from data engineering experts. Learn about data maturity, data stack components, and how to build.