Hadoop Integration Guide for IBM Campaign

Hadoop Integration Guide for Unica Campaign

Written by: Nicholas Ford

Big Data. Big Details.

Many of you who attended our webinar on Connecting Hadoop Big Data to Unica Campaign and Interact Webinar requested that we provide more detailed technical information about how Unica Campaign was configured to work with Hadoop in the use cases we reviewed. Be forewarned! What follows is a highly technical integration guide of how to configure this type of connection in your environment and is not for the faint of heart.

If you are interested in understanding how Hadoop differs from traditional relational databases from a functional Unica Campaign perspective, you may find our blog on The Facts: Hadoop Big Data vs. Relational Databases more palatable.

Step 1: Selecting a Hive ODBC Distribution

The first step to connect your Unica Campaign environment to your Apache Hive instance is to download and install a Hive ODBC Driver. Unica Campaign v9.1.2 supports the DataDirect and the Clouder ODBC Drivers, while v10.0 is also to support the Hortonworks driver.

Progress.com

DataDirect Apache Hive ODBC Driver from Progress.com

cloudera

Cloudera ODBC Driver for Apache Hive

Hortonworks

Hortonworks 64-bit ODBC Driver

Step 2: Installing the Hive ODBC Driver

1. Download the Hive ODBC driver from one of the providers listed above.

2. Upload the RPM file to the server hosting the Campaign Listener (analytics server).

3. Download the UnixODBC Manager, unixODBC-2.3.4, available from www.unixodbc.org.

4. Upload the unixODBC-2.3.4.tar.gz file to the Campaign Listener server.

5. Extract the unixODBC-2.3.4.tar.gz file on your Campaign Listener server.

tar –xvf unixODBC-2.3.4.tar.gz

6. Install the Cloudera Hive driver:

rpm –ivh ClouderaHiveODBC-2.5.16.1005-1.el6.x86_64.rpm

**Note: root permissions may be required.

Step 3: Configure the ODBC.ini file

Once the ODBC driver is installed, you must configure your IBM Campaign server to be able to connect to the Hive distribution. First, you must configure the odbc.ini file which was either included with your Hive ODBC distribution or an existing driver. Edit the odbc.ini file and complete each of the configurations for your Hadoop/Hive instance.

Step 4: Configure the Campaign Environment Variables

Navigate to the Campaign Home directory and then open the setenv.sh file in the /bin directory. Update the LD_LIBRARY_PATH environment variable to include the /lib/64 path from your Hive ODBC driver directory. You must also add a new variable named ODBCINI, which points to the odbc.ini file you edited in the previous step.

Step 5: Importing the Hive Data Source Template

IBM Campaign versions 9.1.2 and later will include a Hive data source template which can be imported into the configurations. Locate the file named BigDataODBCHiveTemplate.xml within the /conf directory of Campaign Home.

Navigate to the Platform Home, then to the /tools/bin directory. Inside this directory, you will find configTool.sh, which is used to edit Platform’s configuration settings.

Use the command:

./configTool.sh -i -p “Affinium|Campaign|partitions|partition1|dataSources” –f /mnt/ebs1/ibm/EMM/Campaign/conf/BigDataODBCHiveTemplate.xml

Step 6: Configuring the Hive Data Source

Within the Unica application, navigate to the Platform configurations via Settings -> Configuration. Locate the dataSources configurations in Campaign|partitions|partition1|dataSources. When you expand dataSources, you should see the option for (BigDataODBCHiveTemplate). If this option isn’t present, then the configTool command on the previous step was unsuccessful.

Enter all required properties into the template for your Hadoop/Hive instance.

**Note: Ensure the DSN property matches the name specified in the odbc.ini file.

Step 7: Adding Test Data

To test your Hive integration, you’ll need to create a test table and insert some data to query. For this example, we will use a file which represents data that was taken from an online mortgage tool and transformed to a simple .csv file.

Place your test .csv file into the HDFS using the hdsfs dfs command.

Step 8: Creating a Test Table

Next, create the Hive table using this command.

You must then ensure that the data types reflect the data that is in our test.csv file.

CREATE EXTERNAL TABLE IF NOT EXISTS
MORTGAGE_ENGAGEMENT_WL(
CookieID INT,
time STRING,
amortPeriod INT,
term INT,
rate INT,
housePrice INT,
downpayment INT,
payFreq STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
location ‘/home/hadoop/test’;

Step 9: Adding Data to Test Table

Next, load the data from the MRTG_TOOL.csv file into your Hive table.

load data local inpath ‘/home/hadoop/MRTG_TOOL.csv’ into table MORTGAGE_ENGAGEMENT_WL;

Step 10: Mapping the Hive Tables

For this step, you’ll need to map your Hive tables. This is accomplished the same way as regular user table mapping. Navigate to Settings -> Campaign Settings -> Manage table mappings.

Since Hive tables will always be User Tables as opposed to System Tables, click on Show User Tables then click New Table.
Select Base record table, then select your Hive data source that you created with the template in the Platform Configurations.
Select one of your Hive tables from the Source table drop menu. The Hive table fields should automatically populate. You can then change the name of the table that will appear in Campaign.

Ensure your field names match as expected, then click Next and enter the name for the table within Campaign.

You must select a pre-defined audience level for the mapped table, along with an associated Audience ID from the Hive table.
You must select a pre-defined audience level for the mapped table, along with an associated Audience ID from the Hive table.

Step 11: Confirm the Table Mapping

Finally, you will test to ensure that you can access the data in our table. Open any flowchart and select the new table as the Input in a process box.

Profile any field in the table to confirm that the data is present and can be queried.

Still With Us?

If you are an experienced Unica Campaign administrator, you most likely noticed the commonalities with configuring traditional relational database sources – UDI (Universal Dynamic Interconnect), ODBC and flowchart process boxes remain relatively unchanged. Even so, those similarities are only skin deep, as working with Hadoop data sources typically demands a rethinking of how your campaigns can wield previously untouched, unstructured data.

For marketing campaign ideas for working with unstructured data, our webinar recording includes a summary of examples by industry to jumpstart your Hadoop project. If you are still struggling with your integration after reading the above steps, please reach out for advice and suggestions.

Nicholas Ford

Nicholas Ford

Munvo Consultant

Nick is passionate about looking under the hood of complex marketing automation systems and explaining them to a diverse audience. This interest has lead Nick to deliver interactive and personalized experiences for clients in the telco, financial and retail industries. Nick is motivated by helping his clients to adopt and integrate emerging technologies such as Hadoop and cloud into their marketing ecosystems.

About Munvo

Munvo is a leading consulting firm helping clients improve their marketing capabilities through a combination of technology implementations, management consulting and marketing run services. In addition, Munvo Lab develops products and connectors for Adobe, SAS, and Unica Marketing Solutions.

The Facts: Hadoop Big Data vs. Relational Databases

The Facts: Hadoop Big Data vs. Relational Databases

Closely comparing Hadoop Big Data with more traditional Relational Database solutions helps you to more fully understand the advantages and drawbacks of each. If you want to engage in more meaningful IT-related discussions and make more informed business decisions, knowing more about available technologies and techniques is a key first step. As Viktor Mayer-Schönberger and Kenneth Cukier put it:

Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.[1]

What is Hadoop?

According to Munvo software partner, SAS:

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.[i][2]

A more concise colleague put it this way:

Hadoop is a technology architecture that makes use of commodity hardware in a highly distributed and scalable fashion, enabling fast data retrieval at a lower cost.

Both definitions are admirably succinct explanations, and both show how the world (and the market) are transforming the way both small and large amounts of data are collected and stored. It’s time to get on board.

Hadoop Big Data Vs. Relational Databases

To see how well Hadoop Big Data stands up against Relational Database solutions like IBM Campaign (formerly IBM Unica), we compared the two, designating seven different characteristics from the outset. In our study, Hadoop Big Data and traditional Relational Databases went head-to-head in the following arenas:

  • Security
  • IT support
  • Static customer profiles
  • Unstructured data
  • Real-time interaction
  • High Volume.
Hadoop vs. Traditional Relational Databases

Figure 1: Hadoop Big Data & Related Technologies vs. Traditional Relational Database

Figure 1 reveals that Hadoop has the upper hand in the last three categories: unstructured data, real-time interaction and the ability to handle high volumes of data. That’s where the “Big Data” comes in.

With more and more organizations realizing the potential of more comprehensive quantities of data to flesh out CRM platforms, streamline data to current marketing solutions or enhance ongoing Business Intelligence (BI) initiatives, Big Data solutions like Hadoop are very attractive.

If you look back at Figure 1, however, you’ll see that Hadoop Big Data is no cure-all. In fact, more traditional relational databases are still superior when it comes to security, IT support, static customer profiles and profile integration.

And why is that?

Schema “On Read” vs. Schema “On Write”

Hadoop Big Data and Relational Databases function in markedly different ways.

Relational databases follow a principle known as Schema “On Write.” Hadoop uses Schema “On Read.”

Schema On-Read Schema On-Write

Figure 2: Schema On Write vs. Schema On Read

When writing data, in IBM Campaign for example, using Schema “On Write” takes information about data structures into account. The data is then used to construct tables, joins, rules and constraints. This approach gives users the advantage of maintaining clean data, which enforces specific rules and structures.

Hadoop, on the other hand, uses a Schema “On Read” approach, in which it typically “dumps” data by effectively ignoring all structure when writing, resulting in “unstructured” data. As a result, cleaning and interpreting data is left to whoever is querying Hadoop during the “read.”

Implications and Consequences

The absence of identifiable rules, constraints and overall structure makes it difficult to maintain a static customer profile that is unambiguous while excluding duplicate data. Relational databases are more suited to storing and maintaining clear systems of customer records, especially with critical information. Hadoop isn’t looking for a specific, single column or row. Hadoop searches for patterns, probabilities and ambiguous recurrences.

What now?

Your organization may have already invested in advanced tools — like ETL, or “Extract, Transform, Load” — that do not easily transfer to Hadoop. What’s more, chances are that your organization has already based its applications, such as IBM Campaign – and maybe its entire infrastructure — on relational databases.

Through it all, it is important to remember that technologies, requirements, skillsets and objectives can, and will, change. Learn all you can and ask the right questions.

Speaking of questions, If you’d like to know more about Hadoop, take a look at our webinar: Connecting Hadoop Big Data to IBM Campaign & Interact, in which we discuss how one Munvo customer is efficiently leveraging Hadoop Big Data within a marketing automation ecosystem and more effectively advancing a variety of nurturing campaigns.

[1]Mayer-Schönberger, Viktor and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt, 2013, p. 7.

[2]https://www.sas.com/en_ca/insights/big-data/hadoop.html#

About Munvo

Munvo is a leading consulting firm helping clients improve their marketing capabilities through a combination of technology implementations, management consulting and marketing run services. In addition, Munvo Lab develops products and connectors for Adobe, SAS, and Unica Marketing Solutions.

Connecting Hadoop Big Data to Unica Campaign and Interact

Connecting Hadoop Big Data to Unica Campaign and Interact

Hadoop, Big Data and related technologies are changing the way organizations take advantage of large and unstructured data sets that were previously difficult to analyze and exploit quickly or economically.

Join Ajay Bhaga, as he demystifies Hadoop and how Fortune 1000 companies are integrating it with Unica Campaign and Unica Interact to accelerate the relevancy and impact of their interactive marketing campaigns.

During the webinar, Ajay will share with you an exclusive preview into how one of our clients is leveraging Hadoop within their marketing automation ecosystem to drive their journey based nurturing campaigns.

Below are some key points Ajay will be reviewing:

  • Why does Hadoop matter? What has changed for marketers?
  • Does Hadoop replace relational databases?
  • What types of analysis does Hadoop excel at? How?
  • What kind of marketing opportunities does Hadoop unlock?
  • How can Hadoop be integrated with Unica Campaign and Unica Interact journey campaign?

Watch the Webinar

 
 
 

Join Ajay Bhaga, as he walks you through the world of Hadoop and how Fortune 1000 companies are integrating it with IBM Campaign and IBM Interact to drive real-time marketing.

Click below to download a copy of our presentation

Webinar Video Transcript

Please note: this is an unedited version of the transcript, so the text may contain syntax and grammatical errors. Thank you for your understanding.

00:00

All right, welcome and good afternoon. This is our moderator for today speaking, this uh, Joe from Client Spectrum here. Be introducing our speaker today, AJ in just a few moments, so I just wanted to get some housekeeping items out of the way before we get started here at any moment. So first of all, thanks for joining. We know it’s in the middle of your busy work day, so we’re hoping to make this as valuable for you as possible. So, the housekeeping items are as follows – as you may have noticed, you are on mute for this call, but there is an opportunity to interact with us. So, what we’d like you to do, is throughout the duration of this webinar, if you could please use the Q&A feature – which is a tool that’s on the right-hand side of your WebEx client – to ask us questions throughout the duration of the speaking portion of the webinar and we’ll do our best to triage and select some of the best questions we’ve

01:00

received, and I’ll be helping AJ throughout that process to respond to your questions. Other than that, you know we are going to be recording this webinar, so if you need to pass it on to your colleagues or you’d like to review some of the material we have here we are going to be making it available to you. So with that, I’m going to flip things over to our speaker and we’ll get started on time here today.

Hi everyone, my name is Ajay Bhaga from Client Spectrum and I’m the technology lead a Client Spectrum, and I’ll give you guys an introduction of who I am. But thank you today for joining us for this webinar on connecting Hadoop Big Data to IBM campaign and IBM interact. To start off with, I wanted to talk about myself. So, I’m the technology leader at Client Spectrum, I’ve worked on over 40 Unica engagements across 25 different customers and in many different countries. I have over 9 years of experience in the enterprise marketing management space, I’ve worked as a senior architect, EMM technology roadmap specialist and also with agile teams to

01:59

develop products or add-ons for customizations within the ecosystem. I’ll tell you guys a little bit about Client Spectrum. Client Spectrum was founded in 2005 and we specialize in IBM, Adobe, SAS Enterprise marketing management EMM Solutions and we have over 50 full-time IBM, Adobe, SAS EMM consultants and we’ve done over the years over 350,000 hours of services and we have offices within North America in Montreal, Toronto and Calgary. I wanted to go over the agenda today, so you guys have an idea of what we’re going to be covering. So, first I’m going to go over the webinar objectives, so you guys can understand what you’re going to take home from this webinar today. Second, I’m going to go over some of the common questions we get from our IBM Campaign and IBM Interact customers. Third, I wanted to go through some data examples, so unstructured data examples, and what are the marketing opportunities that we can gain from these types of data. Next, I’ll go over the common misconceptions between relational databases and Hadoop Big Data. So, these are two types of technologies that I want to explain the differences between the two. Finally, I’ll go over a use case

03:01

about Hadoop Big Data and how you can use it within IBM campaign and Interact, and I’m going to go through one of the cases with one of our current clients. So, within the webinar itself, I’m going to go over marketing opportunities for Big Data Hadoop with are existing investment and IBM campaign and IBM Interact. So what I want to help you guys understand is, knowing that you guys have invested in an IBM Campaign or IBM Interactive solution, how can you maximize your investment and take advantage of Big Data Hadoop and look for new marketing opportunities that you can take advantage of. I also want to demystify the concept of Hadoop Big Data related to the misconception. So, Hadoop is kind of touted with a lot of different ideas in the industry and we want to talk about what are the most important ideas that we see as useful for marketing. Finally, I’m going to go through a use case example for a financial customer we have, for using IBM campaign and IBM interact. I also want to highlight the point that today’s Hadoop webinar. It’s kind of a surface level version of Hadoop. Hadoop itself is a

04:01

diverse set of tools; it really is the Hadoop ecosystem and we’re using Hadoop big data today as kind of an umbrella term to cover all of what is Hadoop big data. So, we’re going to have a follow up IT focused webinar to go through a lot of those technical details and think of this future webinar as a kind of under the hood architecture review and technical detail version of this webinar and we’ll go into the details of the other ecosystem product within Hadoop, such as HBase, Hive, Map Reduce, Apache Spark, Kafka. So tools like this and how they how they work together to kind of build that new marketing ecosystem that you guys are probably going to be curious about. I also want to point out that the audience we have today is about 50% business people, so I wanted to make sure that the messages that they could take away from today’s webinar would be one that was kind of understood by everyone. Next, I’m going to go over some common questions from our IBM Campaign and Interact customers related to Hadoop and Big Data. First, customers are asking us what is the difference between traditional

05:01

relational databases and Big Data Hadoop – is there really a difference between the two? You know, is one better than the other and customers are also wondering, okay, with this new functionality with Hadoop big data is what does it enable for me: can I do new types of marketing campaign? And finally, they want to understand you know between my industry peers how are the other customers out there that have IBM campaign and IBM interact how are they using Hadoop Big Data and is it difficult to integrate, how is the deployment of it? So all of these challenges that other customers are facing, I’m sure you are probably curious of yourself, many of our customers are sharing the same type of information about how to use Hadoop Big Data to bring value to your organization. The next thing I also want to talk about is, what changes by having Hadoop Big Data? I think there’s two main concepts that we at Client Spectrum really want to drive home for you guys today and I’m going to go through that next. So, what has changed with Hadoop big data? I think the two big points that come out from Hadoop and Big Data are this one concept of unstructured

06:01

data which event data about your customers or users and having this new access to large amounts of customer event data, provides a lot of marketing opportunities. But being able to exploit this information, to be able to generate those opportunities,  is a bit of a challenge and we want to talk about some of those. There’s also the concept that now that we have unstructured event data about users, we have a lot of data, there’s high volumes of data and with high volumes of data we can do different things. So, for example, when using high volumes of user information or customer information, we can detect patterns that you normally could not see which are normally fuzzy, and by using these detection systems like Hadoop, we can detect this over large amounts of data over large periods of time. And if we kind of contrast this to how, you know, analytics is done today with aggregates and within a relational database, it can be done but it can be very challenging and very cost, or processing intensive to perform a lot of these activities. So next, I’m going to focus

07:00

again on unstructured data and I’m going to give you guys some examples of different industries within marketing and how they can use this unstructured data to be useful within marketing. So, to start off, let’s take the retail industry for example. Here are three types of data that’s available that is unstructured and is available to the retail industry today and I’m going to kind of share with you guys some of the value you can gain from shopping cart information. So, shopping cart information tells you about, you know, what your customer is doing within a retail store. Now one of the differences Big Data, is that we now have large amounts of shopping cart data, we have large amounts of customer information their behavior information. Aand using this information we can gain insights on determining, you know, better versions of an abandonment campaign or better ways to do basket analysis. We also have geolocation history information which tells you where is my customer going; and this can be useful, for example, within the retail industry for you to understand why my customer is going to a competitor store, and if you know that that you can build a

07:59

system and detect, you know, if a customer is going to a competitor store can I send them a win back campaign to kind of bring them back to us. There’s also quick stream and web body language data. So this data is really the digital footprint of a customer so understanding how your customer is interacting with different devices and within the internet. We can be able to understand the state of the customer and I think where this becomes important is if we consider for example as a retailer you have a customer. Now, let’s take a particular product category like babies. So let’s take a demographic of people you know parents would newly had a child and if they recently are having a child they’re probably really interested in buying a lot of baby products, you know, customers is that when they’re in a certain state a certain mindset you’re looking for a certain type of product and only that product but we also know that eventually when they get kind of exhausted in that product category they tend to look for the next thing, and an example here would be if you’re a parent and you recently had a child, you’re going to buy a lot of baby products, but eventually when you have everything you need. The next thing you’re going to look at is you’re going

09:00

to turn around and look in your home and say hey we need some new tables, we need some new dishes, and you’re going to start looking at house or products and you’re going to see that there’s a new product category that customers turning to. Now the reason why this is relevant is because using big data we can have this vast amount of information about customers behavior and their interest and then we can determine what is the next best product category to kind of build in this predictive behavior. So that’s the true value you do get from having big data and being able to process it so quickly. Next, if we talk about the finance industry there’s other types of data available for example online tools a mortgage calculator for example on a website of a bank so a mortgage calculator itself I know I use myself when I was buying a lavish condo in Toronto, and you know, Toronto has great prices for condos as you guys know, and they’re really affordable. (That was a joke, my sarcasm doesn’t come out so well) But the point is that you know using a mortgage calculator. It’s an interesting concept

10:01

because when you use a mortgage calculator on a website, and if I’m the bank, I want to know which customers are actually using that mortgage calculator. We know there’s a big difference from a user that’s just playing around with the mortgage calculator, because it’s their first visit on the website, versus someone who you know actively is going back and forth to that website trying to calculate a mortgage, and I know I did this behavior when I was looking for a mortgage, so understanding that customer behavior is something we can do with big data given a large amounts of data and we could use tools like Hadoop to process that information. The other type of information that’s available, is if we consider customers that are going to leave, for example, a bank. One of the tell-tale signs of a customer leaving a bank is when that customer turns off their employer direct deposit. By understanding which customers are doing this activity and by processing this information and looking at that customer over a period of time, we’re able to detect this type of thing and this type of information can be used to determine, you know, is there an attrition risk and by using this information to build marketing strategies for retention we

11:01

could really change how we deal with our customers. Next, let’s look at the health industry. Within the health industry, there’s other, new interesting pieces of data. So the Internet of Things. The Internet of Things is interesting. The Internet of Things, the new evolution of the Internet, it’s taking devices and tying them directly to the Internet, and with this new realm of information, we get new types of information. So, for example, have you ever seen a Fitbit or you know one of the scales that’s an IOT device. Also, this type of information can be used to track individuals and see how our individuals adhering to a wellness program and if they’re not occurring so that should their premium be modified. We can also consider, you know, doctor and nurses’ notes, so when you are dealing with a healthcare provider and your provided doctor nurse notes, one of the pieces of information that’s lost many times today is those handwritten notes. So, using natural learning processing to processes those handwritten notes and converting that to a digit digitized form you could build out a set of rules and tools to be able

12:01

to understand whether customers actually following a lot of these guidelines or these rules and when are they not. The other type of information that’s useful is prescription to refill history. So, we know for, example customers, are given a certain prescription and they’re not adhering by that prescription, then we can kind of say you know this customer does not really care about you know abiding by what was given to them in terms of their medical plan and by understanding that information we can kind of adjust, at least in terms of allocation and marketing, how we target those customers. Next, with travel and loyalty. Within traveler loyalty, we have various pieces of information. So one example would be survey results are interesting because they allow you to calculate things like a Net Promoter Score. So where this could be interesting, is for example, if you have a loyalty program and you want to understand which of your customers aren’t as engaged within the loyalty program, which customers would actually work for one of their friends. This type of information you can gather from survey, but it’s

13:01

unstructured data, so using some tools like Hadoop allow you to process this information to be able to identify these trends. Social media activity can be used to, you know, understand the customer’s behavior, and more importantly, in the travel industry consider an airline that’s giving bad service. If customers are, you know, writing about that airline, writing about the services, where they didn’t have a good time, you know, this information really decides on the sediment of your brand and I think sentiment analysis is a new topic within the industry and it’s very important to understand how your customers really feel about you. The other area that could be very interesting is understanding how customers are engaging with tools such as Facebook. So, for example, within Facebook, Facebook has groups, but think about the concept of you know certain groups attract different types of people and if you have certain customer groups that are affluent, that are attracted to certain affluent groups, it would probably make more sense to target premium products to these affluent customers. So, you can see, that using these kinds of tools and its pattern matching in this amount of event data you’re able to really match you know the

13:59

customer with better products. Next, let’s look at the insurance industry. So within the insurance industry let’s consider two examples one is video content and viewing statistics and there’s also inbound call center recording data. So the first one related to videos how this can be useful is imagine having you know on the insurance providers website having video content about different products and services available and if you see for example a customer that watches end to end one of the products related to or one of the videos related to life insurance products, and if that customer is fully engaged that entire viewing, there’s a much higher likelihood that that customer cares about that product then another individual. So it probably makes sense to target to them you know more products around life insurance. And when we talk about the inbound calls that are recording information we’re not going to be useful to think about it, as though the call center agent has a direct connection to the customer. So while that call recording is in place one of the interesting pieces of information that can come from that is customer mood. You can understand how customers feel about

15:00

their current experience. And if we think about the insurance industry where this is vital, think about the first insurance claim you ever make if that’s a bad experience customers will not want to continue that relationship so it’s very important for call center recording information to be transcribed and understood so that people can understand how customers are feeling about the experiences that we as companies are giving them. Next, I’m going to talk about some of the common misconceptions between Hadoop and relational databases. As you can see from this diagram, you can see how we’ve kind of plotted out Hadoop on the left side and traditional relational databases on the right side and down the page you can see different areas within your organization where we think these two types of emerging technologies, or Hadoop as an emerging technology, kind of map out. So to better describe this, let’s just consider first the bottom of the diagram. Within the bottom of the diagram, to the left, we can see unstructured data real-time interaction, high-volume processing, these all really are where Hadoop takes off and I’ve gone over this

16:00

before and I’m going to go over it further in the presentation, but I think we know that so far. I think where were traditional relational databases and Hadoop kind of very is if we think about the other category. So if we consider: security, IT support, static customer profile integration, these are areas where the traditional relational database server better and I’m going to explain why. First, if you consider your organization and traditional relational databases, traditional relational databases have been around for a long period of time and within your organization, you have people processes and applications that have all been built around traditional relational databases. So that means that you have also invested time and energy and, you know, building tools around all of this. So, if you want to change and adopt an emerging technology such as Hadoop, you need to consider the integration cost of that transformation. Well, Hadoop can do lots of these types of processing we’re talking about that traditional relational databases can do. We see them as kind of complementary tools and I’m going to explain further why that is. Specifically traditional

17:01

relational databases where they really play their role is when you need one source of truth information and where Hadoop plays it’s kind of, where it’s most effective, is working with fuzzy data and large amounts of data. So, you can see that on one side Hadoop is good for large amounts of data, and when data is fuzzy, whereas traditional relational databases are good for structured sets of data that are finite. The other point I wanted to make is related to the static customer profile. So, the static customer profile is probably one of the most important pieces of information you have about your customer. It’s telling you the one source of truth about your customer. And we know still the real home for that type of information is probably going to be a traditional relational database, and I’ll explain further why that is, but that doesn’t mean that Hadoop doesn’t have its place. Hadoop itself is great for detect a lot of this information and to capture a lot of that information, but the final story of the information in many times, would probably be in a traditional relational database

18:01

Let me share with you guys a few examples of some challenging questions for Hadoop. So the first question I have is: does the customer have a credit card? Can I market to this customer? I see the customer has two debits and one credit on their account: what is the customers current balance? Should I send my communication in English, French or Spanish? Shat products or services does this customer currently have with us? So, one thing you can kind of see is a trend here, is that all of these questions are kind of truthful or they have one answer there’s not multiple answers. It’s kind of a single yes or no reply, or single response to these questions and this is where relational databases do a much better job than Hadoop. While Hadoop can process this kind of information it’s not made for it’s made for vast amounts of data distributed computing system and to detect trends and fuzzy data and work with unstructured data by the nature of it. There’s another concept I wanted to quickly go over and it’s this concept of schema on read versus schema on right. So,

19:00

to really understand the true nature of relational databases versus to do Big Data, it’s important to understand that relational databases work on the premise of schema on right: which means that the structure that you have within the data is inherently within the database itself. So, the rules constraint and cleansing all are all done at the time of storing the information on the right of it. And one thing I think you guys are probably aware of them on the phone is that within your organization you’re your corporation has already invested in the time and capital for you know training DBAs and modelers to be able to understand the data; to structure the data in an unambiguous way and all of that is an investment that you guys also do have. So, it’s something to consider here that relational databases are maintaining that structure and that investment. Where Hadoop is different as Hadoop has this concept of schema on read which is we’re going to read unstructured data but we’re going to developer determine the meaning as we read it. And that really puts the onus on the developer of the of the processing unit within Hadoop to be able to

20:00

understand what the data suppose used for. Next, I’m going to go over a use case to talk about you know this second cost of a brought up of high volume and politics, just the sheer amount of data within big data being so much. So, I have a use case here for you guys and this the use cases from a real-life customer of ours which is a North American financial institution; but for the sake of confidentiality we’ve hidden the name and we’re calling it Acme bank, but I wanted to walk you guys through a visitor’s journey at ACME bank and I’m going to walk through how we use Hadoop within this  kind of process and I’ll explain to you guys the technologies involved. So, let’s start off with the visitor’s journey at the Acme Bank. First, let’s consider a user that is an anonymous user, that visits the Acme bank website. The user is browsing on the website and then sees the mortgage calculator tool and we can see that you know there’s high engagement with that mortgage calculator tool. Then, at that point, we think okay well the best activity decks to do is, you know, let’s target that user with, you know, a product

21:00

or offer related to a mortgage product or service. At that point, let’s assume that you know the user was on the website on their lunch hour they close the website, you know, they leave it, so our banner didn’t really help us, it failed, the offer failed. But we’re going to try again and how we’re going to try again is we’re going to consider the fact that, you know, that that same user of using the mortgage using the mortgage tool, you know, on their, you know, break before they go for their commute home they go to check the weather you know to see how the hell they’re afternoon commutes going to be, and while doing that we’re able to understand that you know we can send a third party ad through you know the weather website and that’s what we’ve done in this particular example. So, what we’ve done is you know the visitor leaves the website and then while they’re looking through the weather website we target them with a third-party ad related to that mortgage product and fortunately what happens, the user clicks on that display ad and is redirected to a landing page.

22:00

And that landing page it then takes the user to a new page and the user is asked for their email, and we incentivize the user to give us their email, but at that point, what we’re able to do is we’re able to target that user with an email related to that product. So, you can see this journey here where, you know, the first targeting doesn’t always work but what does help us here is we can plan out a journey and use different mechanisms to help us, you know, work with the customer to make sure that we’re effectively catching every opportunity we get. Next, I want to talk about some of the technology involved here. So, I’m going to walk through each of these items and talk about that so the first item what I wanted to go over is item one when we talk about the anonymous visit are going to be activate so what we do there is we use Hadoop to process web logs from the a-creeping website and using that information we then work with the integration that’s built in with IBM Campaign using hive to connect Hadoop and hive to IBM campaign and this is beneficial because now we’re able to use Hadoop data with an IBM

 

23:00

 

campaign to filter out behavior and then at that point what we can do is push that information to IBM Interact which is our inbound offer delivery service and by using interact to send out that message we’re able to send that easily through a channel that exists already. Now, the next thing that’s interesting here to mention is event patterns we’re using. IBM event patterns here, for a particular reason, the reason why we’re using IBM event patterning is because one of the challenges of detecting customer behavior is that we want to know sometimes what a customer does on a website, but sometimes we want to know what a customer is doing on a website now and when we want to know what a customer’s doing on a website now, this is where IBM event patterns works very well. Where if we consider Hadoop in the first node I have this processing data, the first node is passing that day, is processing that data possibly or near-real-time was where the third node is processing it in real-time and we know immediately what’s going on. So, for

24:00

example, when the visitor is targeted with a banner and does not respond to that banner what we’re able to do, is right after that we can use the power of what’s written as TM here, but that’s IBM triggered messaging. IBM triggered messaging allows you to take an event that occurs within the system and you can trigger up another message or another event to occur and what we’re doing is we’re using trigger messaging along with UVX, IBM’s Universal Behavior Exchange, and by using these tools together we’re able to communicate with Adobe Audience Manager. Now, Adobe Audience Manager is a data management platform and it’s quite popular in the industry, but I think the most important thing to communicate here is Adobe Audience Manager, where it works very well, is it works well in communicating with, you know, publishers that are, you know, working on the bid-ask prices of ads. So, I think the important part here this is providing connectivity from your marketing system to these third-party channels and we’re using triggered messaging and UVX to perform this. Next, what happens is, you know, one node six that the user clicks on the

25:01

display ad and is redirected to a landing page. Now, this is a great opportunity for us because within that landing page we’ve integrated interact and by having interact on that landing page, it gives us the ability to understand what the customer is doing; but more importantly, at the point at which the user has provided us with an email, we can connect the user’s activity to IBM Campaign and IBM Marketing Cloud to help us send out an email related to this marketing product. So more specifically, what I wanted to point out here, is TM and you’ve EXO triggered messaging and UVX where you can see they really show lots of value, is what you want to move from an inbound channel to an outbound channel communication. You can see that this pairing works very well and it kind of intrinsically built into the system, just bring back focus to the, you know, the Hadoop portion of this whole flow here. So, I know I talked about: Hadoop, Hive, IBM Campaign, IBM interact and event patterns, but I really want to call out

25:59

the fact that the integration built in here between Hadoop and IBM campaign and Interact and the event patterns, it’s tightly integrated and I think the important part here that’s interesting to understand is that what’s being brought to the table with the new functionality within IBM Campaign and Hive is providing marketers with access they did not have before – and I’ll go further into explain why this is. So, next, here’s an actual screenshot from the IBM campaign integration with Hadoop and Hive and, you know, from the look and feel you can say okay, this looks very similar to what you’ve seen before, you know, the same old process box is the same point-and-click interface and the difference here is this: if you see that select process there on the top left that’s actually reading data from Hadoop and if you look at the process block from the bottom there, it’s an Interact lost and that’s writing data to a whitelist within IBM interact. Now why this is interesting this should this is

27:01

showing a direct in a direct connection of data, or data flow, between Hadoop and Interact and that’s interesting because that provides us with the capability of being able to process big data and filter it out and in this particular example, we’re looking at you know certain customers who have a high down payment with a certain engagement score and within a certain period of time, so here we’re saying about 21 days and we want that data to be filtered out so that those can translate into offers that have been sent to a particular customer. You can also see here from this interface that the table mappings are very similar to what you’ve seen before if you’ve used IBM Campaign or Unica before and this is useful because it really shows that you know when you’re tying data together, it’s the same whether you’re working with Big Data IBM campaign, or you’re working with a relational database. The other point I want to also talk about is the fact that as a marketer as a marketer you’re given a tool here where you have a point-and-click interface and you’re able and empower to be able to connect

28:00

with Hadoop and use that information to build out a valuable tool for marketing. Before I finish off here today the last point I wanted to go over is just kind of a tip when it comes to troubleshooting Hadoop and IBM Campaign. So, when working with you know these new types of emerging technologies, they are great, they are exciting, but I wanted to just share with you guys kind of a tidbit of information we learn from kind of our practices that we’ve done. So, you’ll notice that with Hadoop. Hadoop is an ecosystem reduced a lot of tools and I think understanding that it’s a layer and a set of technologies that are working together. It’s very important to integrate to integration test, but more importantly looking at things as layers so as you layer on new technology. So, for example, in our situation when you have Hadoop clusters you have hive and HBase as the database if you have the ODBC driver you have IBM campaign. Each of these are different components and different layers of the infrastructure that you’re building and it’s important to test it along the way because as you can notice these are not no longer just you know architectural changes the data

29:00

implications too so I think to really understand what you’re doing and make it clear, to save time and energy knowing that integration testing is so expensive I think this is an important to look you guys should know that make it a very a test every strategy and make sure it the layered strategy, that’s the approach I would take: a layered approach.

Okay, so that’s a staging that concludes the speaking portion of our webinar so we’re going to be transitioning into a Q&A. So again, it’s a tool box on the right-hand side of the client we’ve got some questions already for those of you who can’t stay for the question answered periods; thanks again for your participation. Of course, we’ve already mentioned there’s an IT follow-up focus that webinar that we plan to cover this topic as you can see it’s quite a big topic and so we’re definitely looking at this as a series and we welcome your feedback in terms of what you’d like to see included. Best way to reach us is marketing@clientspectrum.com

30:01

Alternatively, we will be in Vegas, so Sin City and what happens in Vegas,  we hope you tell us so that we can include in future webinars so we will be there March 20th through the 22nd please reach out you can find an opportunity to connect there. So I do have some question here Ajay, it looks like we’ve got one that is obviously coming from someone who knows IBM campaign quite well and it seems to be a question around you know did something that this customer should be considering using for their system tables for IBM campaign so maybe you could describe what our system tables and you know subject appropriate for that or not? Yeah, sure, so I think the important thing to consider is first that Unica does not support using Hadoop or your system tables and while that sounds like

31:00

you know a compatibility issue, that that’s not actually a compatibility issue. I think it’s really based on the design of what they have in their product and the reason why is this: the Unica system tables what are they the Unica system tables are a set of tables that are storing the metadata about your marketing campaign structure and also the configuration information and security information of your enterprise platform tool. Now the important thing to think about here is that type of data is one source of truth data right we know it is a single source and we want it to be structured so it’s fast. So, this is a kind of the best case for some of that data you want to store now that doesn’t mean that certain types of data and maybe you know the contact history could be in the future, you know, a translated form that could be, you know, within Hadoop. But I would say today it makes the most sense for system tables to be in a relational database and I’m sure the applications will come in will expand with time.

Okay, great, so I think

32:03

the next question I have is it seems to be a fairly technical one, so perhaps better address center I do follow up but it’s a question around, you know, Hive, its relationship to those ODBC drivers so maybe you could share a bit more of how that networks the question specifically around you know where temp tables generated when you’re using that so is there somewhere you could share their over, you know, is it something to suggest to look at the IT-focused webinar. Yeah, sure. So, I think the point that’s important here is that when it comes to Hive, the power of Hive, is that Hive acts as an abstraction layer for, you know, Hadoop to communicate with sequel databases. So, Hive itself, think about it like kind of the translator of information of, you know,

33:00

Hadoop, which is connecting to a Hadoop database HBase and that speaks in SQL and converting that to a format which is sequel. So, I think the important parts of the note here is that with the integration of high within IBM Campaign and IBM Interact the benefits are that IBM campaign speak sequel and thought that is translated through Hive you have a direct relationship to, you know, the Hadoop database and it will process it in the way you want it to work. That doesn’t always mean it translates to the fastest operation, but to the point of temp tables, I think that the important part here is to understand that, you know, certain times you’re going to want to have temps times tables certain times you’re not going to want to have some tables. So, I think that’s really going to be based on more of the use case so move that the temp table discussion, but I hope that answers the question of, yeah.

I think there’s it looks like there’s a few more parts of that question, so those are great questions. Thank you for that, we will include more

34:01

detail on that in the IT follow-up webinar. There is a more general guidance question that just came in, so this one is kind of, it’s a big question around, you know, when a marketer is looking at, you know, their marketing campaigns to paraphrase this question: how much of that should be, you know, based on these big data sources that we’ve covered and versus how much would be the traditional relational databases and, yeah so, is there any general guidance to in providing that I guess you know the approach we took or you up this through was looking at examples and kind of, you know, you know, perhaps he could enlighten us into, you know, why did you pick those examples, you know, why were those you know this is a deed that the pattern nature them normal is it. Yeah sure, so I think the difference between I guess the data itself, this is kind of a big one between the two

35:00

technologies so if we think about big data versus traditional and marketing data the difference is really unstructured data when we talk about unstructured data we’re talking about event data and when we talk about event data what that means is it means events and it means a lot of data, and when you have a lot of data. So, when you have the amount of volume of data increased that puts you in a position to be able to understand patterns and trends and determine things that are typically fuzzy so for example think about the mortgage tool example if someone just went to the mortgage tool once and you clicked on a few knobs change the value sent it out that’s one behavior that’s one data point. Versus, let’s say, Joe goes on the website of a mortgage calculator tool and goes there you know 20 times in a week is trying all different variations of different, you know, a length of the mortgage fixed versus variable so trying all these different variables within the tool, gives you a different experience of what, you know, someone like Joe would be going through using that tool so I think on one level it’s important to understand

36:01

that you know there’s certain cases where you want to have unstructured data and certain cases where you don’t want to have unstructured data and, I think, the point of a relational database is really to focus on using a relational database one you want to have structure and when you want to take away to end acuity and certain types of one source data one source of truth data needs to have that ability. Yeah, that’s kind of a great answer and I guess some going back to the use case it was kind of fun you know if you think about the mortgage calculator the reason we picked that one is I guess could we say it’s fair and, you know, Duke was aligning with what it’s good at which is the kind of being a catalyst or kind of the starting point for a campaign kind of a detection role right. So yeah, I would say that’s a good point Joe. So, in terms of Hadoop itself it is it is kind of the you know the calculator it’s going to be able to generate data and process vast amounts of data but it is used early on in that process and the point

37:00

is that once it processes that data and you do generate some sort of analytics from it you either have the choice of storing that information in the structure or not and I think that’s where the relational question comes into play.

Great, so, I know this is probably you know an IT focus webinar question but maybe at a high level you know we’ve got a question here in terms of, you know, what the heck is Spark, you know, versus to do but it seems to be something that’s Apache spark it seems to be coming up in the in many circles so if you had to kind of in a nutshell explain the you know the differences or what people should care about as a marketer what would you say. So, I think the important thing about Apache spark is one it’s very interesting and I think all of you should keep it on your radar. One thing we should point out is that anything to do with Hadoop today typically is not related to a real-time scenario and what I mean by that is if you have vast amounts of unstructured event data you’re going to

38:00

have to process that data and while Hadoop is a distributed computing system and can process the data the challenge you have is the time delay of being able to do that how do works but the sort of situations where a dupe is not the best tool and if you talk about Apache Spark, Apache Spark is the answer for that, Apache Spark is in-memory computing in-memory processing on a distributed data grid and the benefit there is that you can have real-time, not near real-time. The real-time processing done and this allows incredible things to happen so good examples of this are you know with the Netflix or Uber they’re using these technologies today to you know to recommend movies to you or for your route for route recommendation so these type of real-time, very high complexity processing that you want to do in memory and very quickly is best case for Apache spark and I just to bring that back to Hadoop, all of that is within the Hadoop ecosystem and that’s the important thing to bring home here; these all integrate very well with Hadoop technologies. Okay, so we’ve got another question

39:01

oppression it’s asking about that integration was campaign sort at this time is hide the only option or there are other options the digital world for. So I would say Hive is your best bet right now and the reason I say that is if you say if there’s other options yeah I mean Hadoop works with unstructured data so any data you put in and out of campaign could it could effectively be unstructured if you keep it that way like we mentioned before big data is related to this concept of schema on reads you’re creating the meaning when you read the data so I think the point to take home here is one the benefit of using hive is you make use of an enterprise-grade tool which is hive that allows you to translate sequel to Hadoop type of processing and by using any other alternative or a custom solution it may be possible for you to build your own integration, but I probably would think that you’d waste a lot more time trying to build in that integration than kind

40:01

of relying on Hive. Hive is a well-known product. Hive is trying to do what it does best, and I think your efforts are probably better spent if you’re really trying to understand you know is hive the best option really reconsider what is the use case you’re trying to build out maybe you’re not processing the data in the best way. Like Joe mentioned before, sometimes you should be processing the Hadoop data in a certain way and maybe the application of what you’re trying to do is not the best one maybe you should be looking at other tools to process that information in the stage format or something else so I think there’s many things to consider but I would say if you’re going to use Hadoop with IBM Campaign, use hive it makes sense and otherwise I would reconsider what the what you’re trying to build out. Okay great, so just that you know doing a quick look at the time so it looks like we pretty much exhausted our time today for the Q&A; section so for those of you that participated in this so thanks very much and you know just to reiterate you know we are going to be following up it’ll be

41:00

an opportunity to have that recording of this webinar made available to you and again we’d love to see in person if you’re amplifying otherwise you know market wide spectrum.com and our apologies for the typo there but as you can imagine this is a pretty you know fast paced you know topic and you know sometimes we did some of these details so you go by the wayside when we’re trying to get the most recent information about these technologies to you so our apologies so once again thanks again for your time you know today taking it out of your get work schedule so we hope it was valuable for you and they look forward to receiving your feedback on this webinar and you know being involved in our planning of future webinars all right thanks very much have a great opportunity everyone.