So just about every organization is looking for ways to hold fast in today’s challenging and uncertain economic environment. AI can help on lots of dimensions from reducing costs to enhancing customer engagement to better managing business operations.
But a lot of struggle to take AI from proof of concept to enterprise then to production. There are some good reasons for that. However, companies had better figure out how to overcome those challenges, or risk getting left behind.
In this article, we’ll explore a coordinated approach to infusing AI throughout your business. Having the right platform is important, but so is having the right structures in teams and addressing deployment issues from the start.
Those use cases are the testimony of Justin Emerson – a modern data architect at Pure Storage and Tony Paikeday – Director of Product Marketing for artificial intelligence and deep learning at Nvidia.
- Practical on-going AI use case for Enterprise
- Why aren’t more enterprises immediately successful in AI?
- What is the right basis to build platform for AI at scale?
- How Teams must think about infrastructure for AI?
- Making your AI Center of Excellence (CoE)
Every business has real three imperatives that they need to address, it’s important to recognize that clearly, we are in a challenging environment.
Many of you today might find yourself facing unprecedented pressures, and even some maybe existential threats that you see on the horizon. But for seasoned leaders, this may not be that new.
If you look back many decades, you’ll see similar inflection points that challenged businesses during these kinds of environments. Well managed organizations did things consistently well, in any of these prior episodes, if you will, and they intently drew customers closer and solidified relationships, customer experience and loyalty.
And the intently focused inwards on ways to streamline processes and save costs everywhere possible. They position themselves for the upturn, using the current situation to invest in enabling better agility to outmaneuver the competition and create an economic moat.
Practical on-going AI use case for Enterprise
So, the remarkable thing here is that AI is well poised to help in all three of these areas. And many of you have probably been inundated with the AI hype cycle, stories of robotic Butler’s and flying cars and whatnot.
But the reality is that there are a lot of practical AI use cases that are perfect for surviving and even thriving in turbulent times.
Now is a great time to implement recommender systems to deliver more personalized experiences for your customers and tailored support with natural language processing and chat bots, now’s a great time to reduce inventory levels and improve forecast accuracy to save money.
And if you’re a manufacturer with large industrial facilities, you might be amongst those saving potentially hundreds of millions of dollars every year in site inspection costs by switching to AI guided drones.
So, now’s a great time to also anticipate customer behavior and intercept return. So, in reality, AI is adept at finding the needle in the haystack, and distilling oceans of data into actionable insights that can put you ahead of your competitors.
So, let’s dive into some examples of what we just discussed here. For example, with natural language processing and chatbots. If your organization runs a customer service operation, you know how important your agents are to protecting customer goodwill and loyalty, if you’ve called a support number since the pandemic started, you’ve probably noticed increasingly longer hold times.
So, AI trained on advanced language models that deliver superhuman levels of language understanding can help you scale the number of inbound questions that you can address coming from your customers, we see it happening very successfully.
In every sector, especially personal banking, companies like Clinc shown here are helping financial service providers, like Turkey’s Isbank reach more customers faster through mobile interface, which means their human agents are able to spend more time on more complex issues with their clients.
Other companies, like Deep Graham, are using AI based call transcription to convert 100% of their recorded call center audio with over 99% accuracy. They’re using this data to help train agents for better outcomes, where they previously could only transcribe maybe 4 to 5% of the calls with at best 20% accuracy.
Now, we talked about manufacturing environments, BMW is no stranger to automation, Their plant in Munich, which is probably about 99%. automated today, BMW is getting their facilities to 100% automation, by focusing on the logistical process within the factory.
They’ve trained their AI models to handle everything from transporting materials to organizing parts. This helps ensure that the right materials get to the right part of the plant just in time, ensuring a more streamlined operation that results in higher production volumes, and lower downtime, ultimately helping drive down the cost per finished vehicle.
Another great example is Walmart. Walmart has been using machine learning for some time now to manage inventory levels to estimate demand forecast. And as we all know, in the current time that we’re in, it’s extremely important to be keeping track of inventory level and track of demand, we’ve all seen instances where we’ve gone to the store and in our mask and gloves, and the thing that we always buy just isn’t on the shelf.
And that can have a major impact on customer loyalty, it can have an impact on customer satisfaction. But moreover, it also has an impact on the bottom line. So, supply chain optimization is extremely important in this current environment where we want to make sure that we are handling strained supply lines, we’re handling challenges around just in time delivery.
So, using machine learning for these kinds of activities, is was important before the pandemic and is even more important in the current environment.
Like the natural language processing examples that had been outlined another company global response, who runs contact centers for large organizations.
They have been implementing their own brand-new contact center software and really infusing it from the ground up with AI. Everything from things like call transcription to sentiment analysis.
I’m sure we’ve all been on a phone call with a customer service representative and been asked to rate how we felt about the call on a scale of one to five and the answer these questions and I can tell you that the response rate for those is very bad.
Most people don’t bother to answer them. But with sentiment analysis, you can actually using those call recordings.
You can make very good estimate of how effective that customer service rep was, did the person have their problem solved? How did they feel about the call, just from the tone of their voice? So, there’s really revolutionary stuff that’s going on in this space.
It’s important because more and more interactions between organizations in the people that they serve, are moving online or over the phone. Whereas before, it might have been a minority or a very slim majority of customer interactions.
Obviously, now it’s moved into, almost exclusively into a non facing person interactions with our customers. And we want to make sure that customers don’t leave and go somewhere else, because this other company has much better tools at their disposal to serve the needs of their customers.
So, while all these things were important, before the current situation, the current situation has emphasized how important these are to business.
Why aren’t more enterprises immediately successful in AI?
So why isn’t this the norm. First off, AI models for those who are not familiar, which are designed to let’s say, solve a business problem aren’t built or deployed like conventional software.
The team building them includes individuals whose expertise is in algorithms and running experiments, and certainly not in engineering code.
And they’re not experts in things like scaling or security, or IT disciplines. With machine learning there isn’t a simple process of taking a finished AI model and deploying it in production.
It’s resource intensive work done in a highly iterative manner, by data science artisans, people who are really hard to find, expensive to hire, difficult to retain and whose output is largely opaque to anyone who wants to explain how these AI models actually work.
So, when you look at the sweat equity that’s plowed into all that work, organizations are incurring a growing amount of what I’d call model debt, in terms of the investment in resources sunk into undeployed, or even under deployed models.
Model development involves a complex process that has multiple pipelines for data preparation, model prototyping, training, and inference. And you’re not just building a single app. It’s a model, it’s a web service. And it’s the integration of these things.
Assessing it in production isn’t a simple pass or fail determination. It needs a data scientist involved to continually evaluate its performance, which can degrade more rapidly than conventional software. So, monitoring and retraining, the model on a continual basis is key.
They must improve the end and AI lifecycle from development to deployment. From something that, as I mentioned, is highly artisanal today to one that’s industrialized or mechanized and accelerated and integrated into a more familiar standard enterprise IP operation. To get there, we need some teams to work together.
Ultimately, in order to overcome these problems, you need to bring together two worlds that are currently very much disconnected and worlds apart.
This is really about enabling data scientists to focus on what they’re great at and enabling IT and data engineering teams to focus on the right platform that supports the end to end lifecycle of AI from developed to deploy.
We do not want data science teams, wrestling with code, building platforms or infrastructure.
We want them creatively experimenting, iterating as fast as possible to get to a model that offers the highest predictive accuracy for a business problem to be solved.
Conversely, we want data engineers who can build pipelines that enable effortless mobility of data sets across the development workflow, and working with their infrastructure team to deliver the resources and performance.
That speeds iteration cycles and are manageable within IT within an IT DevOps kind of setting.
Now, the happy confluence of these worlds is what is collectively known as ML Ops (Machine Learning + IT Operation). And it will ultimately enable more of those great models to get realized in production versus stalled at a pilot stage. ML Ops includes the right hardware infrastructure that delivers the right resources for each job.
Whether that data ingest, manipulation, training or inference, and it speeds the iteration cycle, so data scientists aren’t waiting for results from an experiment.
ML Ops also includes workflow management tools that let enterprises Manage Users, data sets and experiments, such that a standard process can be implemented, that takes models from prototype to production.
The net effect of this is the ability to streamline the handoff between these teams, while ensuring manageability and accountability, and creating a cyclical life cycle on which models can be continually evaluated for drift and retrained with new data on an ongoing basis.
Now, with this in mind, it’s probably helpful to do a walkthrough of what deployment looks like in practice, and where they tend to land.
But the experiments that your data scientists run might be more suited to on premises infrastructure. Obviously, you would want to own and manage those environments.
But the truth is that most customers, and most companies will have infrastructure in both places, which is true of most things, really. So that doesn’t make this unique.
Similarly, it’s great to get started in terms of experimentation. Because obviously, capital requirements for the public cloud are extremely low. But as companies begin to build out more and more models, and operationalize more and more different algorithms, they may find that it starts to make more sense to own the infrastructure vs rent.
But the one constant across all of these is where is your data? I like to say the data has a gravity, and it takes a lot of inertia to escape that gravity.
If your data set is living in the public cloud, it can be very helpful to do that training in the public cloud. So, you don’t have to lift and shift or maintain multiple copies of your data.
Similarly, if you have your data on premises, that you want to train off of, maybe it’s they’re not only for legacy reasons, but maybe it’s there for compliance or governance reasons, then it makes a lot more sense to have it on premise (your own server).
So, a common question that I get is, if I want to implement ML Ops, or if I want to build an infrastructure to support these kinds of workloads, where should I build it? Should I build it in the cloud?
Should I build it in my own data center? Should I build it in someone else’s data center? And the answer is really all of the above. In many instances, different parts of that ML Ops pipeline may run in different places.
If your needs are extremely elastic, at, say, your inference workloads are very bursty, you happen only at certain times of the day, those might be great workloads for the public cloud.
But the goal is to enable all those individuals who are part of that ml ops workflow to work as quickly and as effectively as possible. And to reduce the speed bump between the handoffs from each of those different individuals.
And, and what’s important to understand is you need to make these decisions as a business leader now, because if you don’t, they will be made for you.
So, I’m sure we all remember the stories of procurement departments going and asking business units: “Hey, why are you buying so many books at Amazon?” When it turns out that they were really just renting cloud infrastructure, right 🙂 ? Anecdote from maybe almost a decade ago now.
And we’re seeing similar things today in the space of machine learning and artificial intelligence. Why? Because AI infrastructure is fundamentally different. The applications and algorithms are fundamentally different.
If IT does not evolve to support the needs of the business, the business will go around them. And this is not a good thing, right? It creates silos of infrastructure that are outside of IT Dept sights.
That’s bad for data governance, that’s bad for compliance. That’s bad for security.
Moreover, it’s a waste of budget. Because if one line of business is doing their own thing, and another line of businesses doing their own thing, there’s a common set of services and infrastructure that they’re probably not sharing and are duplicating somewhere.
And it also creates silos of knowledge within an organization, because let’s say one line of business has figured it out, but the other are still struggling.
If they’re not aware of each other’s efforts, or they’re not sharing information and infrastructure to help solve these problems together, then you end up with speed bumps, or slow down, that could have been avoided if we address these challenges centrally.
How to avoid those speed bumps and what is the right basis to build platform for AI at scale?
You must consider what goes into actually not only building but also training, deploying, and operationalizing, a machine learning or deep learning algorithm.
There may be areas where you’re ingesting data, there may be areas where you’re storing metadata about the algorithms that you’re training, there may be databases where you store performance data, there may be monitoring information that IT needs to determine how healthy the entire pipeline is.
And that’s before we even get to the data sets that you’re working on, which are inputs, the data that you’re curating in order to bring a model, the process data set. So, this is after going through data hive, processing, other kinds of transformation to prepare it for training.
And then finally, the actual results of inference, which is the process of introducing new data to these algorithms, each of the different steps along the pipeline carry its own set of data.
And if you’re not careful, what happens is, you end up passing data from a data source that passes it to another data source and creating multiple copies, creating time delays, as these massive data sets can move throughout the system.
And then that only gets exacerbated the more of these models, you deploy, because each model is going to contain this pipeline.
And one of the common things here is the storage. So, looking at not only how do I make sure across my entire enterprise, that I’m reducing duplication of effort?
How am I avoiding shadow AI, but also within the context of each individual model or project that I’m reducing duplication of effort, both from a storage standpoint, and from a processing standpoint.
It’s imperative to build infrastructure to support these algorithms that doesn’t create all of these silos of data. Because data itself takes time to move takes time to process. And the more we can do to centralize that workload, the faster everything is going to be from end to end.
But it’s also not just about the storage, it’s also about the computing effort.
How Teams must think about infrastructure for AI?
If you took a look at a typical data center built on what I’d call legacy compute or traditional CPU based computing, you’ll easily find three silos of servers.
Each silo designed in scale to act attack one kind of computational problem, whether that analytics, or training or inference, for example, three different kinds of servers, each with their own architecture, resources, planning considerations, and scaled independently of each other.
Each silo capable of running only what it was intended to run, and nothing else.
This inflexibility is driving up capital and operating costs inside the enterprise data center.
And so we’ve worked intently on solving this inflexibility with a new platform in architecture that delivers all these things, offering enterprises, a universal building block for the new AI data center, fully optimized for the end to end lifecycle of AI, from prototyping, to development to deployment, from analytics to training to inference, it does it all.
So now enterprises can enjoy a single universal platform for homogenous infrastructure that supports heterogeneous workload, run any AI job as it were in the life cycle on any system at any time.
And every system can be tasked to tackle the workload required and run that workload with the right amount of compute resources, right size to the job at hand.
With one system type for the entire data center, we’ve not only consolidated the silos, we’ve made capacity planning incredibly simple. So now building your AI infrastructure on an optimized platform means having, if you will, your own elastic AI infrastructure, your own private AI cloud.
Making your AI Center of Excellence (CoE)
So where do you take this forward leaning organizations and enterprises that want to infuse AI, into their endeavors are taking a new look at infrastructure, they realize that running up objects in the cloud and shadow AI is not really making sense.
And they need to understand that the right infrastructure strategy can be an enabler for consolidating AI expertise across the organization, standardizing best practices and accelerating the time to solution on the most pressing AI opportunities.
This is what we call the AI Center of Excellence. And many teams maybe like yours, have used it to democratize the access to AI development resources, centralizing development, lets your team pool previously disparate resources, platform and personnel such that you can eliminate innovation silos, and streamline and accelerate AI development workflow.
And remember what I mentioned earlier on the artisanal nature of data science development, and how it’s hard to attract and retain talent. Well, when you build a centralized shared infrastructure for development, you’re also creating a platform that benefits your people in two ways.
First, your center of excellence or CoE attracts the world’s best talent, we’ve seen it happen over and over again, in businesses like yours, they all want to do these data science, artisans, they all want to do their life’s most important work using the best tools that are out there.
CoE demonstrates that you’re a forward leaning organization with respect to AI, and you have the tools and resources that they can use to build incredible things.
Second, the experts who can build your best AI applications are maybe already working for you.
They’re inside your business units, and they know your problems and data better than anyone. Many of them want to evolve into data scientists but they need mentoring, and an environment where they can learn valuable skills while shadowing other experts in your organization.
The center of excellence creates that environment and lets you groom and scale citizen data science expertise from within, saving you a lot of money versus hiring from outside.
In conclusion those decision making around platforms and standards, starts with the CIO and their infrastructure team. And if you’re the CIO, you may be facing an existential crisis of sorts.
In tough times nobody wants to be seen as purely a cost center, but rather they want to be seen as an enabler of business transformation. The reality is that IT can actually lead the strategy to accelerate business transformation with the power of AI.
But first, you need an infrastructure standard that can actually industrialize modern AI development workflow, instead of letting researchers run rampant, running up on Opex, and doing Do It Yourself platforms on shadow AI as we’ve been calling it.
CIOs need to get in front of all of that and drive architectural choices that will centralize AI compute infrastructure, and consolidate people, process and technology.
The result will be amplifying the expertise that exists within your organization, while enabling a robust talent development pipeline, and it’ll shorten the deployment timeframe, and let your business see more of their innovative models make it from prototype to production.
And certainly, in challenging economic times, it’ll help your organization save costs by consolidating infrastructure and reducing spend, thanks to higher, more efficient utilization of resources.