Back to episode list

THE DATA ECONOMY PODCAST

HOSTED BY MICHAEL KRIGSMAN

THE DATA ECONOMY PODCAST / HOSTED BY MICHAEL KRIGSMAN

Use Real-time Analytics, Streaming, IoT, and AI to modernize apps and increase business value

Norm Judah, Former Enterprise CTO / Microsoft
Mike Gualtieri, VP & Principal Analyst / Forrester

https://www.youtube.com/embed/sLjue-7cHDQ

“The processes that were ‘slow’ can be made much faster if you can get the right data at the right speed and process it accordingly”

Norm Judah
Former Enterprise CTO / Microsoft

Norm Judah, Former CTO of Microsoft, and Mike Gualtieri, VP Analyst at Forrester, discuss the challenges and potential in using real-time data and AI for application modernization. In this episode, we’ll simplify the issues, discuss the technology, and talk about strategies that leaders can use to accelerate innovation and deliver for customers.

As Former CTO of Microsoft, Norm developed and drove technical strategies for customers around the world. After 40 years in the industry, he retired and is currently an Evangelist and Strategy & Leadership Advisor, serving as a key voice in the areas of innovation and digital transformation. Norm is also a board advisor to EPAM System Inc. and Model9. Prior to Microsoft, he developed leading-edge transaction processing and real-time distributed systems at Imperial Oil.

As VP Analyst at Forrester, Mike’s research focuses on AI technologies, platforms, and practices that enable technology professionals to deliver applications that lead to prescient digital experiences and breakthrough operational efficiency. He’s authored more than 130 research reports and is a recipient of the Forrester Courage Award for making bold calls that inspire Forrester clients to make great business and technology decisions.

Norm Judah
Mike Gualtieri
twitter icon
Mike Gualtieri

Transcript

MICHAEL KRIGSMAN: We’re discussing real-time data and machine learning with Norm Judah, who is the former enterprise CTO of Microsoft together with Mike Gualtieri, who is an industry analyst at Forrester Research. 

Thank you so much to Redis for making this conversation possible. I’m very grateful to Redis that we can do this. For business leaders, the question of real-time data, and the use of machine learning, and analyzing that data, and supporting innovation and great results for our customers remains complex. 

In this conversation, we will simplify the issues. We’ll talk about the technology, we’ll talk about the data, we’ll talk about machine learning, and we’ll bring it all together for business people, for business leaders to make informed decisions about these topics. And about how to use data to innovate, and to drive great results for your customers. 

So Norm, tell us about your areas of focus. I know that you were the enterprise CTO of Microsoft. You have a pretty broad background. So welcome, and I’m excited to hear about the things you’re working on. 

NORM JUDAH: Hi, Michael. Thank you. Sort of to frame it for me I retired about four years ago. I thought I’d be doing nothing, but I’m not. I’m actually pretty active in the space. But that 40 years before that was spent really in two big buckets. About 12 years working for Exxon doing real-time systems in refinery control. And then 28 years at Microsoft starting up with a breakout of the enterprise business, and going all the way through AI. And so have this broad background primarily in the enterprise space and also in consumer. 

MICHAEL KRIGSMAN: Great. And Mike Gualtieri it’s a thrill to speak with you. Tell us about the areas of focus in which you work at Forrester. 

MIKE GUALTIERI: Well I’m an industry analyst, so my job is to help our enterprise clients understand the technology landscape the use cases and the best practices. In my particular focus area is on real-time data– the intersection of real-time data and artificial intelligence. 

And I think it was about maybe 12 years ago I did the first Forrester Wave on at the time CEP, complex event processing, which was a sort of a hot new technology at the time that was about analyzing real-time data. Mostly financial and trading data at the time. Since then, and I think we’re about to talk about it, streaming data is everywhere real-time data is everywhere. 

MICHAEL KRIGSMAN: So Norm when we talk about real-time data, or sometimes it’s called high speed data, what actually do we mean? 

NORM JUDAH: Yeah, there’s a subtle difference or not so subtle between the two of them. I think the place to start for me that’s interesting enough is if you have this information available, what can you do with it? And there’s sort of the open end– open loop control system part of it, which is you get the data, you process it in some way, and then you take some action, or you advise an action. 

The other end of the spectrum is closed loop, which is you’re getting real-time data that you’re making real-time decisions on it. And that’s a very different view than the open loop advisory versus the closed loop execution. And the speed of the data, and the speed of the decision kind of defines, and is coupled to the business process. 

So if you have a slow process for example mortgage origination. That’s a fairly slow system. The decision making that happens there is slow. It might be in hours more likely today it’s in days. The data that you’re getting does not have to be very fast, and the decisions are not very fast. The other end of the spectrum is you’re in a safety shutdown system somewhere in an autonomous car or a refinery, and as an event occurs, you have to make that decision super quickly in order to prevent something. 

And so there’s this spectrum of speed that is actually tied to the half life of the business process. The slow process can use slow data. High speed process, fast data. What I see mistakes being made is people looking at slow processes, and insisting that they have super high speed data when the decision is in hours, or minutes, or days. And so understanding the nature of the data and the decisions to be made with it are where you actually end up starting. 

We coined this phrase about real enough time. Your time of your execution has to match the time of the half-life of the process. And so in that mode real-time data is data coming out of an executing process. Which could be automation. But in fact could be a website that’s getting a lot of data in quote, “real-time”. And then how you process it, execute it, ingest it, store it, analyze it, and do all the things you will with it starts to determine to some degree the characterization of the data, and the decisions with it. 

You get high speed data, which could be video data, which is super high speed, but you don’t have to make real-time decisions on that data unless you’re actually editing in real-time, which could happen. And so data streams, and how you deal with them characterize the business process, and the decision making that happens with them accordingly. How you store the data and process it that’s really the conversation that we’d love to have today. 

MIKE GUALTIERI: Yeah, and I completely agree with that Norm. What I often tell people is that it’s business time, right? As you put it to the context of the business process. So Wall Street trading algorithms, microseconds. Uber standing on a corner waiting for an update that the car is going to come around the corner maybe a handful of seconds. But both would be characterized from a business use case as real-time use cases. Very, very different time frames, but I would characterize them as both a real-time use cases. 

NORM JUDAH: I think what’s interesting there is the perception of the user on the definition of real-time. So your Uber example is really interesting because five or 10 years ago, if I got a response in 10 minutes, that would have been great. 

[LAUGHING] 

Far better than a taxi. Now, I’m looking at seconds because it’s a real-time decision to have to make. And even the mortgage origination example. If you had a company that could actually analyze your mortgage as you’re online, and say yes or no in seconds, they have an incredible business advantage today. So I think the interesting consequence of this is that the processes that were quote, “slow” can be made much faster if you can get the right data at the right speed and process it accordingly. 

MIKE GUALTIERI: Yeah and Michael the conversations I used to have with enterprise architects we’d always have a conversation about what time frame is real-time. And we’re not having that conversation anymore. So I think it’s more generally understood that it is business time. And since we’re talking about terminology high speed, real-time a lot of the questions that I get as an analyst come in as streaming. 

We want to talk about a streaming use case. And sometimes that means like Norm said, it’s high speed data, but we’re not doing anything with it right away. But other times streaming use case means it is real-time data we’re going to analyze it or make a decision on it immediately right away. So I’d say streaming is one of these overlapping terms as well. 

NORM JUDAH: One of the interesting things about– stick with the real-time for a second is if I bring that data stream into memory, and I’ve got it in memory, do I have to keep the history, or do I only keep the current value? What is the useful data that I’m working with? And in some cases it’s the current value. I don’t care how you got here, but I care what the measure is now. 

In other cases, you’re very interested in the data stream that led to the current position. In which case, the storage requirements are actually orthogonal different in terms of the nature because if I’m storing a lot of data in real-time, should I store everything? Or do I store every second, or every two seconds? Do I store on average? Do I actually store all the instances? And that again, comes to the analyst having a deeper understanding of the data, and its usage scenario to be able to determine that. Because in many cases, I’m just interested in the current value. I’m interested in– 

MICHAEL KRIGSMAN: Give us some examples of that. So what’s an example of storing the data as distinct from storing its history how you got there? And what are the implications of that as well? 

NORM JUDAH: If you take an example an emergency such shutdown system of some kind in an industrial environment. Stick with IoT for a moment. And there’s a transmitter that looks to see if a valve is open or not. And every second it sends you a message says, I’m open, I’m open, I’m open, I’m open, I’m open. And then comes a sign that says, I’m closed for whatever reason. Do you care that it’s been open for an hour and a half, or do you care that it’s closed at a particular point in time? So I don’t need a store all the open data that is there I just need to know that the state is the same, and that at a particular benchmark time it exists. 

The second that I actually see it– or the instance, sorry, not the second, but the instance that I see it’s closed, now I actually might have to take an action. And so I may or may not want to store all the data stream. I may or may not. That depends on the actual business problem being solved. But there are many scenarios like that where you’re interested in the instance and not the history in other cases. I actually don’t care about the current measure. I care how you got there in which case you are interested in this full stream. 

MIKE GUALTIERI: Yeah another example of that– another IoT example is some sensor that’s bleeping out a temperature. Same thing 90 degrees, 90 degrees, 90 degrees. And you’re not concerned necessarily that it goes to 100 and back to 90, but rather you’re concerned how fast did it get the rate of change. If all of a sudden it went from 90 degrees to 100 degrees, well, maybe it’s that emergency shutdown scenario. So that is streaming analytics. And that is the idea of storing. It’s stateful. It means it means you’re keeping that. 

And that brings me to another very important example with real-time data, which is sometimes called complex event processing. But this is what you’re interested is in a sequence of events that occur within a certain time frame. So if within 1 minute, this machine’s temperature spikes, and the vibration level on this machine drops. And if that happens within a 1 minute time period, we’re interested, right? 

And you can imagine financial scenarios too. If this stock price goes up, and this stock price goes down within this time frame, we’re interested. So in order to deal with that sort of situation you have to store the results. You have to– it has to be a stateful analytic that you’re keeping on real-time data. 

MICHAEL KRIGSMAN: So you’re then looking at the volume of data, and the speed of data, and obviously this is going to have implications for the way you construct your system, and how that data is consumed, and stored, and so forth. 

NORM JUDAH: Yeah imagine you’re the data architect or the storage architect. Imagine in that scenario, and just the one that Mike described. But I don’t know the example. I don’t the use case. I know I’m getting a lot of data in, and I’ve got to make decisions about how I store and what I store. You actually don’t really know all the possible end user cases, so the default decision that everybody makes is I’m going to store everything forever. And then I’ll build the thing afterwards, which may be the wrong answer. 

So the data analyst, or the storage architect has this terrible dilemma of actually trying to understand the possible scenarios, and what he needs to do ultimately be able to switch it on, which says I’m not going to store every instance today. I’m only going to store state changes. But I have the ability to actually switch on a system that would then store the trajectory of the data to that point. I can switch that on and off if I want to, and without having to change my overall storage architecture because to redo the way you’re storing it would be incredibly complicated after the fact. 

MICHAEL KRIGSMAN: And Mike you were just jumping in to say something. 

MIKE GUALTIERI: So I mean if there were only two use cases for real-time data, one is where a single event occurs. There’s a single piece of data, and you care about that single piece of data. And Norm sort of represented this as the first use case. But there’s a second more complicated use case which we’ve just been talking about where there’s a analytics on this real-time data. 

There’s data that comes in a sequence that influences whether it’s some analytic, or some pattern that you’re detecting in time. And it’s very easy for businesses to think of the first use case. This happens, I do this. This happens, I do this. It’s very difficult for them to imagine the latter use case with streaming, but that’s where there’s tons of opportunity. 

NORM JUDAH: I think the interesting scenario that flows out of that is let’s say you’ve gotten– we’ll touch on this later. –a machine learning model that can actually do anomaly detection. And anomaly detection on a single event is really easy. The valve open I’ve got an anomaly do something. But if you have multiple streams of multiple measures, and an IoT scenario, or a banking scenario, or trading scenario, you’ve got multiple input streams coming in. 

And you’re feeding this into this wonderful ML engine that some guy wrote somewhere, trained somewhere, and suddenly it throws up, and says I’ve got an anomaly. Understanding the anomaly the causal interaction of all of these variables that led to the anomaly. That starts to be pretty interesting, and it’s not intuitive to a human to actually see the anomaly. It takes a machine of that complexity to recognize an anomaly. 

But the human has to decide what to do with it. And understanding deeply what’s happening the dynamics of the system to be able to take that anomaly, and do something with it. That’s actually one of the most interesting problems because invariably the machine learning system is going to throw out something that you’re not expecting, and now what do you do with it? And understanding the corrective action, the next best action, you can take from it is really interesting problem that might have incredible business value if you can interpret it correctly. Or the consequence, which is don’t do anything, and the building burns down. 

MIKE GUALTIERI: There’s some amazing opportunities in use cases as Norm is describing as I was describing. But there are some more mundane ones that enterprise are very interested with real time data, and that’s– look at a large enterprise, look at the portfolio of applications. I talked to a large bank they have 3,000 applications. Why? They merge with– they bought all these companies, and now they have all these different applications, or even their wealth management division may have six different applications or travel and hospitality. 

So to many enterprises, real-time data just means that something happens in one system, and I need to get it to another system. I’ll give you a very simple example. Travel company, airline. Reservation is made on the reservation system. The loyalty system is over there, right? And they’ve built an app for the customer. So the customer has an expectation that when they make the reservation, that when they go to the loyalty system, they’ll see that the reservations there, and they’re going to get the points. 

Well, that happened with a big batch process before. It’s like we made the reservation, we ran the job, we dump the file, and then we batch processed it, we updated the loyalty system. Insufficient to create a great customer experience. The expectations are higher. 

So to some enterprises, real-time simply means when we make that reservation, we want to send it immediately, and update the loyalty system. So when you look at a very popular open source project Kafka. Its use case is that delivery use case. And so many enterprises are starting with that use case, and that in itself is providing a lot of value. 

NORM JUDAH: I wanted just– Mike, sorry. I want to just touch on that a little bit because there’s a really interesting consequence of that, which is schema understanding. And actually making sure that the piece of information that I’m sending to you is actually– you interpret it as the same way as I interpreted it. 

Because reservation might mean something different in your two systems. And a confirmed reservation versus a held reservation, and you start to quantify reservation starts to mean different things. And one of the places where I see– particularly disparate organizations either inside of one company or across companies. The fidelity systems that you talk about where the point system is actually in a third party system potentially. 

The lack of ability between those two systems is a huge problem because when I send you data, I mean something, and you interpret it as something else. And so the speed of the data is a real problem, but understanding of data is actually just as important. So that there are interesting dilemmas about self describing data that when I actually move the packet over, I not only move the data point, but I move a description of the data. 

That’s now an agreed to description, and you can go back to EDI of many years ago that actually attempted to do this with self describing data. And we laugh about EDI, but they actually did some great things in that time frame to be able to ensure that the real time data is self describing. And can go across certain boundaries, and still be meaningful. 

MICHAEL KRIGSMAN: So we’ve just discussed the nature of real-time data, and some of the applications. What about the technology challenges associated with creating business systems that can handle and manage this kind of data? 

MIKE GUALTIERI: Well, the first thing I always like to say about real-time data is, what data does not originate in real-time? Right? Because from a system standpoint it materializes instantly in real-time, and something has to be done with that data. Maybe not from a business standpoint more often than not from a business standpoint, but something technical has to happen to that data. 

It either has to be stored, it has to be moved, it has to be sent to another system. So we have sort of a framework we use to describe three scenarios of real-time data, and I sort of alluded to one of them which is delivery, right? Data originates in some system it needs to be delivered somewhere else immediately. Point a, b, c, d, e, right? Might not just be a point to point it could be broadcast, but it has to be delivered, and that’s all you’re trying to do with it. You might do some minor enrichment the way Norm talked about. You may do something with the schema to broadcast it in some sort of standardized way that everyone understands. 

The second one is also the scenario we talked about, which is the analytical on, and that’s a stateful query. So you’re looking at this stream of real-time data, and you’re analyzing it in real-time. And the analysis of that in real-time is going to determine an action that you take. And that analysis in real-time could be a machine learning model, it could be a simple average, or it could be some kind of complex sequential pattern detection. It could be geolocation that occur entered a different zone. So there’s delivery, there’s analytics. 

And then the last use case is what I call processing of real-time data. So it’s where you’re going to transform that data– it’s necessary to transform that data in real time, and deliver it usually to some bigger repository that’s used on an ad hoc basis. So it might be a data warehouse, it might be a data lake, but that people are querying, and need to query that. So it needs to be updated on a real-time basis. It may not be queried for an hour, but it could be queried right away. So those are sort of the three use cases, and they– three of them require very different technologies, which we can talk about. 

NORM JUDAH: I just want to go back to the delivery because there’s another core element of delivery in the data, which is– and particularly for IoT where sequence counts. Which is guaranteed delivery once only in sequence. And it’s easy to say, but much harder to do because the sequence of the data is incredibly important. And to get that sort of transactional delivery, and ensure that happens coherently. And what triggered my thought was when Mike talked about fan out when I actually have a data source that’s being sent to multiple places, you actually want to make sure that they all get the same data in the same sequence for the right application. 

It’s not for everybody, but in the right sequence, and trading is actually one of those examples. And, so the underlying network to be able to do that on the delivery is a choice that you make. So we talked about some of the technology. That becomes a choice that you can make as you start to sort of deploy and understand these systems. But the higher up analytics might be totally dependent on the sequence. And if you’ve got the correct data, but in the wrong sequence, you can absolutely make the wrong decision. 

MIKE GUALTIERI: Yeah, and sort of an addition to that is that some events have a timestamp and some don’t. And if they don’t, then the timestamp is when you got it. When a particular technology got it. But if it has a timestamp, some of the technology can deal with out of order events to some extent, right? If it’s within a certain time window, it can reprocess the analytic based upon the late arriving event, so it can get complicated. 

MICHAEL KRIGSMAN: The database is certainly a very important part of all of this. Are there special characteristics of databases that are suitable for real-time data? 

MIKE GUALTIERI: Oh, well, I mean, there’s so much happening in the database world. So first of all, absolutely– a database is already a real-time instrument right if you think of a transactional database, that’s what it is. We’re updating this record. We’re making– we’re doing a transaction that happens in real-time or it should. But the notion of a database dealing with streaming data is a reality now as well, right? 

And because a lot of what you would think of as transactions coming in aren’t necessarily applications making those transactions through an API or an application. It could actually be event driven. Meaning there’s just real time data coming in, and it needs to get into that database. So increasingly we’re seeing database technology being able to accommodate that ingestion. And the technical challenge there is– well, the technical challenge with databases is always handling these different workloads without affecting the other workload, right? 

So, and that’s why at least historically you have transactional databases that are very good at transactions, and databases that are very good at data warehousing like full scan queries. While streaming adds another workload that can saturate network bandwidth, and do a whole bunch of other nasty things that can affect other workloads. So the database vendors are responding by being able to accommodate these workloads, but manage them in such a way that balances the performance of the different requirements. 

NORM JUDAH: I think Michael if you actually break the quote database problem into a couple of fragments one of them is about storage, and keeping the stuff for a longer time. There’s another part of the database, which is making that information available to applications to process. And there’s an interesting challenge, which was around those two images and the synchronicity between them. Because on average they’re actually the same. But in any instant they actually might not be. But over a day– the integral over time for a day they would be identical, but in any particular instance they’re not. 

And so how you look at the database, and set up your business requirements for storage whether it’s archival storage or not, and then the availability to application. You need to separate those layers from one another, and understand the usage scenarios. And they all be different because you might not need that ultimate storage, but you do need the high speed availability of information to an API for example. And so database is a very broad term there. You have to start breaking it apart, and starting to look at the subsystems. And get a better understanding of how you need those subsystems, and how you get the data, consume it, ingest it, publish it, and so on to be able to sort of understand the right selections of your architecture. 

MICHAEL KRIGSMAN: So you’ve both used the term machine learning. How does machine learning intersect with real-time data? Mike, you want to jump in to take that one? 

MIKE GUALTIERI: Yes. It can be a complicated question but let me answer that. Let me answer it an easy way first. A machine learning model, which has been trained somewhere else on historical data is a model asset. It takes inputs. It’s going to do an output, right? So at its simplest level a machine learning model can be a service that is called on incoming real-time data to make a decision. So it can look at an event, and it can say, yep, do this, don’t do that, or it can even enhance it. So at the simplest level it’s very, very simple to use machine learning in real-time. 

Now, training and creating that model may have been very difficult, and it may have been done in a different platform. Continuing with this thought about using a machine learning model. From a developer standpoint, it’s like, well, what are the parameters in? What am I getting out? But sometimes streams– real-time streams of data are not very rich. It may be a device ID, and that’s all. But the machine learning model needs three other variables. It needs to know the device type. It needs to know some other stuff about that device in order to make the prediction. 

So the challenge becomes enriching the stream with data. And this is where– and this is where a lot of machine learning– the use of machine learning can break down is the enrichment of that data because that can become an entire project in itself just to get those three additional data pieces together. Because they may have existed somewhere in another database, so you end up caching it. So that’s number one. 

MICHAEL KRIGSMAN: Give us an example. 

MIKE GUALTIERI: So your– we keep using IoT example. So I was trying to of a different one, but an IoT example where there’s a let’s just say it’s a delivery truck. One of the big delivery truck services. And that machine has an ID, so it’s streaming– every time that vehicle stops it’s streaming the ID of that vehicle. And it wants to predict where it should go next, right? It doesn’t have a pre-programmed route. It’s going to– every time it stops it’s going to decide where it goes next. 

So the real-time data is that device ID, but the machine model needs a lot more. It needs to know the driver, it needs to know the current location, which it may be get as a GPS, but that’s not good enough then it needs the street. So see there’s all this lookup information that the model is going to need to actually decide the next step to go. So pulling in that reference data normally you’d have to keep that in memory. In an in-memory database, or an in-memory cache to reference data. 

NORM JUDAH: Actually, I’ll give you another example that some of you might be familiar with. Let’s say you have an electric car that has a certain battery capacity, and then on your display it says that you’ve used x and you have 237 miles left. There’s some exact number that’s shown. 

And then you’re driving along merrily, and you see the numbers sort of start to decrease. And then you hit a mountain pass, and you go up the mountain pass, and it’s cold at the top of the mountain pass. And suddenly you thought you had 50 miles to go ahead, and now the system says, no, no, no you need to go backwards because the temperature conditions, and the altitude have actually changed the characteristic of the battery. 

In that case, I’m not only interested in the geolocation, I’m actually interested in the look forward about you know where I’m going, so you know my target. You know the path that I’m taking, and you could be much more predictive of the nature of the environmental stuff that’s going to have on the battery. So my speed and location is necessary, but way insufficient to make the right decision of do I turn around to go charge my car or not. 

MIKE GUALTIERI: And Michael think of– here’s one more. Think of a chat bot or Digital Assistant on an e-commerce website, right? That’s real-time, right? You’re in real-time. You’re asking a question about a product or a service, and you expect a response. Now, if that response is coming from a bot, you need to process that in real-time, and it’s a machine learning model that’s doing that. 

And a bot that’s not sophisticated won’t be stateful. It won’t remember the elements of that conversation to go back to. It’ll just take each question independently. But a smarter bot will be able to refer back, and we’ll see the stream of the conversation all of it powered by a machine learning. And all of it in the real-time of the user having this conversation. 

MICHAEL KRIGSMAN: And where are we in the state of business using machine learning models in practice at this point in time? 

MIKE GUALTIERI: Well, I– 

MICHAEL KRIGSMAN: When it comes to real-time data. 

MIKE GUALTIERI: When it comes to real-time data. Well, let me give you some data points that we have. We’ve surveyed for years large global enterprises said, hey, what are you doing– to what extent are you doing AI? And we don’t define it. We just say, what is your company doing? So right now the response rate for 2020 actually was I think 64% OK. And before 2019 was in the like 56%. And then before that, the years before that, it was going up too. So we had a very, very big spike. 

Now, they’re not doing a ton unless it’s an internet native like an old school company that’s doing a lot might have 200 use cases. Most are doing half dozen a dozen, but when we asked those same companies that are doing it to what extent has it had a positive impact, 73% say that it has a positive impact. And we also know from our survey data, I haven’t memorized the data, but that now it’s strategic. It’s not experimental. Companies are moving forward. They want to do the use cases. 

Now, you asked about the real-time of those use cases. Increasingly, they are becoming the real-time use case. We’ve been talking about those use cases. There are though batch use cases for machine learning just like there’s batch use for data. But we like to think that AI– that intelligence is needed in real-time. And so most of those use cases are not real-time because AI is real-time, but it’s because enterprises are becoming more real-time. And so AI is by virtue of that. 

NORM JUDAH: Michael I think if you actually look at these very large models, large models that are trained on massive data sets both private and public data sets, those models are huge and the resulting system is huge. And you need large processing power. Some cases very large processing power to run the model. The other end of the spectrum is a model that is very narrowly focused on a single use case with very simple data. It’s a much smaller model that can execute much more quickly in a lower profile. 

And so you could actually see the IoT scenarios where there’s a model that’s trained in the cloud, but downloaded to the edge. And runs right in the edge that can do some AI that has very successful, and high value, but it’s not the bigger global problem that could be done further away from the network. And so you have ML running at the edge, ML running in the network, and then ML running in the cloud. 

And those are all different scales of both complexity and execution environment. And, so if you go back to the question about where are these things happening, a lot of the stuff that I see with AI is actually happening in the cloud. There are big models with people experimenting in the cloud. Where you actually see very focused work is at the edge. Is it’s actually a little bit easier to do if you have the right compute environment at the edge. 

Camera recognition is one of those which is actually recognizing somebody at your front door. And we’ve seen that being commercialized like crazy right now, but if you think about what’s happening in your camera, that’s pretty interesting from where we were even five years ago. So you are seeing machine learning being deployed very broadly where you don’t realize that it’s happening. 

On the other side of it in complex problems in the cloud we see those– particularly I see them in banks and manufacturing, in consumer products being developed in the cloud with large teams with huge value. And so your question needs to be looked at across the spectrum of what’s there. I want to add one more– sorry. 

MICHAEL KRIGSMAN: You know– please. I didn’t mean to interrupt please. 

NORM JUDAH: So I want to add one more dimension to this, which is the nature of the companies doing the work and to some degree the market opportunity. Because the scenarios that we’re talking about are generally relatively sophisticated from an IT perspective companies who can undertake this task either on their own or paying somebody to do it for them in some fashion. 

And they’ve been experimenting for a while. Mike talked about those numbers going up in terms of strategic and execution. But there’s a credible opportunity in medium sized companies who don’t have the wherewithal to do it. And, so the consequence of that at least the way I think about it is there will be a marketplace for models that is a medium sized business. 

I’m going to be able to go out and buy a model that is good enough, but might not be super great, but it’s good enough to move me forward. And I think you’ll see that both in the slower– quote, “slower environment”. But I think you’re also going to see that in the more real-time environment where you’ll be able to buy models, and deploy, and execute them quickly because they’re come wrapped in this sort of completed runtime environment, and you– pre-trained if you like. And they might not be perfect, but they’re good enough to add value. 

MICHAEL KRIGSMAN: All of this begs the question, how should business leaders think about using real-time data to support their business, their business model, their customers? What’s the right way to approach it for a business person? Mike, you want to try that? 

MIKE GUALTIERI: Sure. So this advice is actually not only for real-time data, but also for AI. Because they both kind of apply to the same process, which is to look at your business process. Just forget about real-time, forget about the words AI, and just walk through a business process on a whiteboard if you have to. And walk through each step of that business process. Because as you analyze all of those steps, and you ask yourself two questions. Is there something I could predict here to make this a more intelligent process? Bypass a step for example, or make a better automated decision. And is there something I could do quicker about this process? 

So when you do that, and when you ask the questions in that manner leaving the technology out. Quickly, you’re going to have a half dozen opportunities for an improvement of that process. And those opportunities are going to map to either or both AI and machine learning. 

Now, here’s the thing about though investing in either one of those. There’s a cost associated with doing it, right? And you’re not worried about that at first, right? So then you’re going to have to bring some technologists in to give a gut feel assessment of can we do this because you’re going to have to prioritize those use cases. And the nasty thing about machine learning and investment is that you don’t know if it works until you try it. 

Because you actually have to try to train a model with the data that you have if you’re doing a custom model. I mean, Norm made a great point about pre-trained models, right? Those are fully baked, and you may be able to plug one of those in. But you’re going to have to invest in machine learning model use cases in a similar way that a VC invests in companies. 

They do their due diligence, they believe they’ll all be successful, but probabilistically there’s going to be two great successes. So it’s a different way of applying investment to this as well. And to some extent, the same applies to streaming because there’s going to be some cost associated to acquiring, and using that streaming data in that same process. 

MICHAEL KRIGSMAN: Norm, that sounds kind of ugly. I don’t want to– if I’m a business person– I’m not a VC. I want predictability, I want to get a team, get the technology, and I want to know it’s going to work. 

NORM JUDAH: I love the idea that you brought up the quote, “business leader” in this because from what– the experiments that I’ve seen with AI has mainly been the techie guys having a wonderful time deciding which engine they should use, or what data that they’ve got with no real business outcome. They experiment, and they do something, and you see this trapezoid, which is see an interest in development. You see we’re experimenting with it, and then it just drops off because nothing happens. 

And, so to Mike’s point I think it’s incredibly important that the business leader, the VP of sales, or the VP of marketing actually is the sponsor of the activity. And they define the hypothesis. It’s no use to IT guys defining the business hypothesis. Somebody has to define the business hypothesis of what we’re trying to accelerate, or what predictive nature can we actually execute on. 

And you need a time box it. You need to say, right we’re going to give you six to eight weeks to do this, and at the end there’s going to be an exit criteria of this experiment. And we can decide whether the experiment is successful or not to take us to the next stage. And back to your VC model, we’ll go to the series A funding after we’ve gone through the experiment. 

And, so it’s the notion that it’s a business process you’re trying to figure out. And so business buying is not an important– it’s leadership from the business. And, so this is not a technical problem. It’s actually there’s lots of technology maybe too much to solve it right now. Business engagement is essential. And without it, you might as well not do it because you’ll have a lot of fun, but you won’t actually see the business benefit that comes out of it. 

MIKE GUALTIERI: Yes Michael rather than– so rather than it being an ugly process it’s beautiful process. 

MICHAEL KRIGSMAN: And OK I’m the VP of sales, and I’m hearing you talk about this, and intellectually I get it. But I’m starting to have a hard time breathing, and I feel heart palpitations. 

MIKE GUALTIERI: No, no. 

MICHAEL KRIGSMAN: And I’m thinking how do I even manage this kind of team. 

MIKE GUALTIERI: Because– OK so Michael as the VP of sales stop thinking about AI, stop thinking about machine learning, and instead, tell me what you would like to predict. I’ll tell you what you’d like to predict. You’d probably like to predict which salespeople to assign to what accounts. You’d probably like to predict their ability to achieve quota, and you probably like an update on that every day. Now– 

MICHAEL KRIGSMAN: Yeah, you’re hired. You’re hired. OK good. 

MIKE GUALTIERI: I’ll build the model for you. So you don’t– so it’s questions about what you’d want to predict on the machine learning, and I like the way Norm put it. And then I’d also ask you what processes would you like to accelerate? I know the answer to that. Sales. In general, but we would break that down a little bit more. 

NORM JUDAH: So I think– let me say because there’s both the sales forecast, but let me give you an example on financial forecasting. And this is actually a real use case. Imagine you’re a large company, globally distributed, and every quarter you actually have to roll up both with an actual in the forecast for the next quarter. Every big company has that. 

And there’s this incredibly complex burdensome application where in every country the VP of sales has to give a prediction. It goes to the controller, who puts some adjustment on it then the GM of the local country actually puts him a judgment on it. Then it goes to the region, who puts some judgment on it, rolls up to HQ, and you come up with this forecast. 

But if you actually have the data that shows you forecast and actuals for the last 10 years in every country, and you run a model. It would be interesting to see how accurate the model would be of the forecast. Now, it’s complicated because you have to understand the economics that are going on in a country. The dynamics of what’s happening there are events like COVID can’t predict. 

But if you could produce a model, instead of having seven people in every country touch the data it’s actually only one or two. And you can actually create a forecast that is more accurate because you’ve actually got not only the forecast, but a history of actuals against forecast. I can actually change that process completely. And this particular company has actually done that with actually have changed the way they do forecasting, and pulled it out of the hands of an awful lot of people. And they actually– because they’ve got actuals and forecasts they can actually be real. 

MICHAEL KRIGSMAN: I like that. So I can make my decisions faster, I can be more efficient in terms of the use of people, and I’m going to get greater accuracy about predicting which customers may buy, and how to match the sales reps up as Mike was describing. But one thing still makes me– and I don’t like this obviously, but one thing still makes me nervous is there some type of culture shift that needs to take place inside my team around thinking about the use of data in this way? 

NORM JUDAH: Well, I don’t know if it’s actually thinking because AI is just another way of helping you make a prediction. It’s very complex that it uses lots of history. It has all sorts of models in it, but it’s just another analytical tool that you have available to you to allow you to be more predictive. The value of it is that it actually has this massive ability to deal with complex data that is not humanly recognized. 

The things that models can do humans would struggle with at least today in that sense of being able to get there. So it’s just another way of looking at analytics. If you go back to a CRM system to take the example earlier on, and I look at a pipeline in a CRM system today. As the sales manager I’m looking at that pipeline and I know because my conversion rate is 30%, my pipeline needs to be 3x my target. 

Why? Why should your pipeline have a 3x conversion, and somebody else’s pipeline have a 2x conversion? And so the ability of AI to help you doing that that’s where it comes in, and dealing with those complex scenarios. So it’s no different to what you’re doing today. I think the real difference is as the VP of sales what you were doing before is intuitive. It was the way you think. It was the way you run. You interpreted the information. Now, you’ve got this way more sophisticated tool to give you additional inputs for you to actually put your judgment. 

The interesting challenge what happens is when the ML system, when the AI system, gives you a recommendation, that is not intuitive. Do you as a human say, no, no, no, no, no I know better, or do you actually take the machines word for it? That’s where the judgment part of it becomes super interesting. 

MIKE GUALTIERI: Yeah, and Michael your concern is widespread about the black box effect of a model. But the machine learning community, and the vendors, and the open source community have done a ton of work in the last two years on explainability models. And so now you’ve got all of these sophisticated explainability models that are consumable, and designed for business people to say, well, here’s the variables. Here’s why it’s making this decision it’s making. Now, some models are easier to explain than others, but there’s a whole movement towards explainability to help satisfy some of those issues. 

NORM JUDAH: So Michael I think my recommendation for those business leaders is you need to go to AI school for business. The same way as you might have gone to data warehousing for business. You need to go to AI school for business, which is not the depths of reinforced learning, and how the engines actually work, but understanding the scenarios and what it brings to you. Because if you don’t do it, all your competitors are going to do it, and they’re going to eat your lunch because of the additional advantage that they’ve got. 

MICHAEL KRIGSMAN: As we finish up, let me ask each of you the same question. Mike, let me start with you. What advice do you have for business leaders on using real-time data to innovate, and to accomplish great things for their customers? 

MIKE GUALTIERI: Start with streaming flow. It’s the easiest possible use case. It’s about data that originates in real-time in one system that can immediately be valuable in another system. And in fact, this is the biggest use case we see for real-time right now. So that is the lowest possible– that’s the lowest hanging fruit possible for that. 

And the decomposition of that type of problem is just what we said before. It could be based upon a user experience where you want things to be updated more quickly, or a business process that’s quicker. So that’s my key advice, and that’s going to keep your enterprise architects, and your solution architects busy for a long time. So that’s number one. I’ll give you a number two. 

Number two really understand this whole notion of stateful real-time analytics. Streaming analytics is what we call it. And use that as a innovation strategy in some of your key business processes. Because it’s the hardest thing to understand, and to use for many companies, which means it’s hard for your competitors to understand and use. So I think you’ll really be able to find some innovation if you understand those concepts as well. 

MICHAEL KRIGSMAN: And Norm it looks like you’re going to get the last word. Your advice for business leaders on using real time data to support their business, their customers, and innovation. 

NORM JUDAH: So the speed of the process is in the eyes of the beholder. The interpretation of how real-time the process is the way you see it. And, so one of I think the core recommendations is a question you should be asking as a business leader is you we believe that this process is it fast it can be, but what happens if we could make it faster? 

What could we do to convert this process from being real enough time at one day cycle to real enough time in minutes and hours? And what could that do to our business? And, so the idea is how could you speed your processes up? And do you have the data available for you to do that? And don’t look at what you have today, but look at what you could have. And so rethinking the process, and then being able to process it in real-time, huge, huge opportunities. So essentially of rethinking the process. 

I think the other one was the thing we just talked about, which was that business leaders need to step up, and actually deeply understand the nature of these systems and what they can do. Whether it’s AI, or analytics, or streaming, or real-time, or real time at the edge, real-time at the center, real-time at the network, and so on. Business leaders are going to have to have that understanding. And should actually encourage them to engage with their CTOs, with their technology leaders, but also with their peers. 

Understand what’s happening in your peers in the market because we’re at a discontinuity of the things that are possible today. And those who can do it, and do it are super successful. And those who watch are going to be watching for a long time. 

MICHAEL KRIGSMAN: Great advice from both of you on the importance of understanding the capabilities that are now possible. And talking with people inside and outside your organization to embrace those new capabilities to support the innovations that you want so badly. Norm Judah and Mike Gualtieri thank you so much. 

NORM JUDAH: Thank you 

MIKE GUALTIERI: Thank you.

Gain insights on how to use data to drive business growth

Your peers also viewed

Aerial view of a intersecting highway

EBOOK

Data Innovation in Financial Services

The digital economy is challenging bankers to re-evaluate their business models. Learn solutions for the four common challenges that arise when making the shift to real-time financial services.

Stay up to date on the latest data content