Tech Brew caught up with Babak Hodjat, Cognizant’s CTO of AI, who’s credited with helping to invent the NLP tech that paved the way for Siri, about what his new San Francisco-based lab is up to and why he thinks multi-agent architecture could be the next cloud revolution for enterprise.
This conversation, the second of two parts, has been edited for length and clarity. (Part one is here.)
It feels like we’re hearing more and more about agents, especially in the past few months. What do you think the future is there?
Well, first, let’s level-set on what an agent is. If you consider an agent to be a large language model wrapper around some functionality or some data, or some microservice or API, then that transition to an agent-based architecture is already kind of happening without us even realizing it. For us at Cognizant, we have a very large intranet called One Cognizant. All the apps and everything is there. It’s very, very useful. We all use it internally. And so immediately after generative AI systems came out, people were thinking, “Hey, why do we have this clunky search interface to this list? Let’s have it be like a ChatGPT interface into this at the top level.” At the same time, this is an intranet, so you have an HR app and you have a finance app, and all these apps are under there, and all these teams that are responsible for these different functions are also saying, “Hey, for this HR app, let’s replace the clunky search box with a ChatGPT-like interface.” So what ends up happening is you have the top-level of the ChatGPT-like interface, and then you have all these apps having their ChatGPT-like interface, and it’s silly for you to be handed off from one to the other. So you come to the top level and you say, “Hey, I want to fill out my timesheet,” and it takes you to the second level. And now you have to type in, “I want to fill out my timesheet again” to the timesheet app? That doesn’t make sense. So very naturally, people start thinking, “Wait, I can actually have the LLM representing the timesheet app talk to the LLM representing the top-level of my intranet app in natural language, so that I don’t have to force my user to do that. They kind of touch base with one another, and they handle my query.” Immediately, when you start thinking that way, you’re starting to think about an agent-based architecture.
And there’s a ton of really interesting things that come out of that. For starters, that communication between the two nodes is itself going to be a natural language because that’s what large language models understand, and it’s intents-based, which means that you have a separation between that intention that’s going back and forth, and the actual format of the API call or SQL to a database, whatever else it is down there. And so what happens there is that you have a much more robust system. You can yank out a system and upgrade it and change it, or just redefine the API, and you’re still fine, because these nodes are talking to each other in natural language. The other thing you can do is you can actually have each node, in servicing the queries that are coming to it, doing some reasoning. So it can log its reasoning. You know, “The user wants this. I think I should do this first, and then this, and then this. Oh, the user says, No, don’t. Don’t do that. Do this other thing.” So now I have a log; it’s not just a log of API calls—this dry, formatted, “here’s what happened.” It’s a reasoning log that I can interrogate. And I could check to see, for example, does it fit my ethics standards? You know, from a responsible AI perspective, is it safe? Those sorts of things can be overlaid on top of this. So it has a lot of benefits beyond just the entry point, because this kind of sneaking in of the agent-based perspective starts with, “I want a natural language interface to everything,” but it ends up kind of with a whole bunch of other goodies that just make sense. And so that’s what I’m seeing. I think not too many people right now realize that they’re on this journey already, but I think the few that do can see that there is a whole migration that’s going to happen incrementally into this world of agent-based architectures.
So the fact that these nodes are speaking with natural language to each other improves explainability, even if maybe the reasoning of the agent itself is a bit harder to figure out?
The large language model itself isn’t inherently explainable, that’s very true. It’s a deep learning-based black box, obviously…but it can explain its reasoning. And, in fact, it actually first authors its reasoning and then runs it. Like, that’s basically how these systems, if you’ve played around with ChatGPT, for example, you can see how it actually lists 1, 2, 3, “Let’s do these things,” and then starts doing them. That’s how these systems operate. And so it’s that which, I mean, is going to be transparent. There’s some transparency in the workflow. So when you think about it, as humans, we’re kind of similar. We’re black boxes—like, who knows what’s happening in our brains—but if we’re told to log our reasoning, or the steps that we need to take, or we’re told what steps to take and then, if there’s an exception, to note that, or whatever. Within the workflow that we’re operating, we’re introducing more and more transparency. So yes, the worker nodes themselves aren’t transparent. But in the job description for each node, we can actually infuse transparency versus the alternative, which would be, “Hey, I’m going to ask this AI system, which is one monolith, huge model, to do something for me,” and then I expect the output to just reflect what what it’s done for me. Which is the wrong way to use these systems. Like, the reason why everybody is afraid of hallucinations with ChatGPT is exactly because they’re counting on it to pull from its black-box knowledge that it trained on to come up with some answer without telling it where to look, how to look for it, how to actually list what it’s found, and why it’s doing certain things. If you do that, then it becomes more and more transparent and less open to those sorts of risks.
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
Cognizant just launched an AI lab in San Francisco in March. Is this the kind of work you’re doing there?
So as the CTO AI for Cognizant, I run our AI R&D lab here in San Francisco. It’s quite unique; we actually push the envelope on the science of AI itself. Our KPI is really to publish papers and peer-review journals and conferences. We currently have 53 issued US patents on core AI technology that we have, and we’re set up in a way that the cohering principle around our research is agent-based decision making. We believe most data and analytics and AI is ultimately in the service of someone making a decision. So how can we actually enable that, augment that, improve that, using AI itself? And so any breakthroughs that we have, any inventions that we have through research, ends up in a platform…and that platform actually is in the service of our clients making decisions.
Are you able to say anything about which clients you’re working with, or which clients are using this kind of system?
Many clients are, first of all—the use of generative AI is almost pervasive. It’s not even now down to a client using it. It’s all divisions within a client wanting to use it. So I get in front of the same client five times, and I’m talking to different people every time, because they have these disparate generative AI use cases. So that’s one thing, and then the second is, some of the early-adopter clients, some of whom you’ve probably heard when, when they were early adopters, for example, with ChatGPT and generative AI itself last year, when OpenAI made their announcements, some of those guys are actually now the early adopters when it comes to agent-based architecture…It’s a much more incremental and smooth transition than, for example, migrating to the cloud. Migrating to the cloud, first of all, had this resistance built in. Like, 20 years ago, people are like, “Are you kidding me? Like my data, my apps, going on the cloud. I need to know they’re secure”…back then, there was the reluctance. And then there was this whole big investment that they had to make to take everything and put it on the cloud. With an agent-based architecture, you don’t have to do that, it’s very incremental. You can pick your battles, go where you get the most productivity.
You mentioned before the spring and winter cycle in AI investment. Do you think that this current spring is going to last a while?
Who knows? I’ve given up on predicting the future. I think there is some disillusionment sinking in a little bit, where people have been expecting the world of a single, large language model. For some, that might be disappointing…But as I said, I think this time around is different because these systems are so widely applicable and so powerful, and because, based on what I see so far, we’re just scratching the surface of their applications that I think that investment is not going to dry out.