Blog Data and Analytics Older Blog SAS Unica

The Facts: Hadoop Big Data vs. Relational Databases

Kevin Clarke June 5, 2020

The Facts: Hadoop Big Data vs. Relational Databases

Closely comparing Hadoop Big Data with more traditional Relational Database solutions helps you to more fully understand the advantages and drawbacks of each. If you want to engage in more meaningful IT-related discussions and make more informed business decisions, knowing more about available technologies and techniques is a key first step. As Viktor Mayer-Schönberger and Kenneth Cukier put it:

Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.[1]

What is Hadoop?

According to Munvo software partner, SAS:

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.[i][2] A more concise colleague put it this way: Hadoop is a technology architecture that makes use of commodity hardware in a highly distributed and scalable fashion, enabling fast data retrieval at a lower cost.

A more concise colleague put it this way:

Hadoop is a technology architecture that makes use of commodity hardware in a highly distributed and scalable fashion, enabling fast data retrieval at a lower cost.

Both definitions are admirably succinct explanations, and both show how the world (and the market) are transforming the way both small and large amounts of data are collected and stored. It’s time to get on board.

Hadoop Big Data Vs. Relational Databases

To see how well Hadoop Big Data stands up against Relational Database solutions like IBM Campaign (formerly IBM Unica), we compared the two, designating seven different characteristics from the outset. In our study, Hadoop Big Data and traditional Relational Databases went head-to-head in the following arenas:

Security
IT support
Static customer profiles
Unstructured data
Real-time interaction
High Volume.

With more and more organizations realizing the potential of more comprehensive quantities of data to flesh out CRM platforms, streamline data to current marketing solutions or enhance ongoing Business Intelligence (BI) initiatives, Big Data solutions like Hadoop are very attractive.

If you look back at Figure 1, however, you’ll see that Hadoop Big Data is no cure-all. In fact, more traditional relational databases are still superior when it comes to security, IT support, static customer profiles and profile integration.

And why is that?

Schema “On Read” vs. Schema “On Write”

Hadoop Big Data and Relational Databases function in markedly different ways.

Relational databases follow a principle known as Schema “On Write.” Hadoop uses Schema “On Read.”

When writing data, in IBM Campaign for example, using Schema “On Write” takes information about data structures into account. The data is then used to construct tables, joins, rules and constraints. This approach gives users the advantage of maintaining clean data, which enforces specific rules and structures.

Hadoop, on the other hand, uses a Schema “On Read” approach, in which it typically “dumps” data by effectively ignoring all structure when writing, resulting in “unstructured” data. As a result, cleaning and interpreting data is left to whoever is querying Hadoop during the “read.”

Implications and Consequences

The absence of identifiable rules, constraints and overall structure makes it difficult to maintain a static customer profile that is unambiguous while excluding duplicate data. Relational databases are more suited to storing and maintaining clear systems of customer records, especially with critical information. Hadoop isn’t looking for a specific, single column or row. Hadoop searches for patterns, probabilities and ambiguous recurrences.

What now?

Your organization may have already invested in advanced tools — like ETL, or “Extract, Transform, Load” — that do not easily transfer to Hadoop. What’s more, chances are that your organization has already based its applications, such as IBM Campaign – and maybe its entire infrastructure — on relational databases.

Through it all, it is important to remember that technologies, requirements, skillsets and objectives can, and will, change. Learn all you can and ask the right questions. [1] Mayer-Schönberger, Viktor and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt, 2013, p. 7. [2] https://www.sas.com/en_ca/insights/big-data/hadoop.html#

Talk To Our Experts Today

Sales Inquiries + 1 (514) 223 3648
General Inquiries + 1 (514) 392 9822
sales@munvo.com

The Facts: Hadoop Big Data vs. Relational Databases

The Facts: Hadoop Big Data vs. Relational Databases

What is Hadoop?

Hadoop Big Data Vs. Relational Databases

Schema “On Read” vs. Schema “On Write”

Implications and Consequences

What now?

Talk To Our Experts Today

TL;DR article summary

Command Performance: Seeking and Solving with Munvo Companion

Campaign Connections: Taylor Guitars Hits the High Notes with SMS, Adobe Campaign and Connector