Answering Financial Advisor Questions

Morgan Stanley isn’t using GPT-4 on a production basis yet, but its experimentation with the technology is pretty far along. It has identified content in over 100,000 of its own documents on which its over 16,000 financial advisors (FAs) might have questions: investment recommendations (what is our research organization’s view on Alphabet stock and what’s the bull and bear case for its future performance?), general business questions (who are the five major competitors to IBM?) and process questions (how do I include an IRA in an irrevocable trust?). It has “fine-tune trained” GPT-4 on these issues with the 100,000 documents as a training corpus.

If you’re not familiar with fine-tune training, you may want to explore it. It’s a way to combine the amazingly humanlike dialog abilities of GPT-4 or ChatGPT with the detailed knowledge residing within a particular company. I once did a lot of work in the knowledge management space, but that field has lost popularity because of the challenge of embedding knowledge effectively into technology. Generative technologies are likely to rejuvenate knowledge management.

In a large-scale test of the use case, 300 Morgan Stanley FAs are trying it out when they have questions on these types of topics. When they get an answer they can give it a thumbs up or down, or give more detailed feedback if needed. Thus far they like it. McMillan is obsessed with measurement of his innovations, so his organization is looking carefully at how the FAs use the new system, what problems and successes they have with it, and ultimately how it affects client engagement.

Beyond Hallucinations to Ensuring Currency

I asked McMillan the question of whether the added training they had done with GPT-4 on their own documents eliminated the “hallucinations” problem of GPT-4 and other large language models—their tendency to go off the rails and confidently express something wacky. He wasn’t ready to declare that a non-issue, but said thus far it hadn’t been a problem. The documents on which they fine-tune trained the system were carefully curated—not just for this application, but in their normal knowledge management process—and their investment research is reviewed by compliance officers. The process for reviewing documents is such that there are no concerns about inaccurate advice from them.

McMillan also said that Morgan Stanley is limiting the types of prompts that FAs can enter into the system. OpenAI has found that extended conversations with the GPT systems tend to cause many of the reported hallucination problems, so those aren’t allowed (as they are no longer allowed in the Bing chatbot based upon GPT-4). Morgan Stanley also restricts prompt topics to business-relevant issues, which largely ensures that the outputs will come from its own curated documents. In addition, McMillan supported OpenAI’s claim that GPT-4 is generally less subject to hallucinatory responses than GPT-3 was.

Morgan Stanley also has a lot of built-in checks on accuracy. Every week they submit a set of “golden questions” to the system to ensure that the answers it gives are correct. In the everyday use of the system, when or if there is a situation in which the content seems incorrect, an FA can consult a reason code that refers back to the underlying document from which that content is extracted. That’s not true of most LLMs, and it’s a big advantage in creating trust through transparency.

What has been more of a challenge for Morgan Stanley is ensuring currency of the answer that is returned after a prompt. If, for example, an FA is asking for an analyst recommendation on a particular company’s stock, they want the most recent one available. The model has to be trained to fetch content from the most current source for that type of content. If, on the other hand, the FA is asking how best to structure a client trust, that information is not likely to change much over time, and currency is less of a concern.

Multiple Underlying Models

Indeed, Morgan Stanley has found that different types of content require different training and result in different underlying models. McMillan said, “We’ve had to spend a lot of time thinking about different content sets and optimizing the results for each type.” An overall model may be fine-tuned on 100,000 Morgan Stanley documents, but different models are sub-fine-tuned on, for example, only investment research recommendation content. When the FA specifies a prompt, the system determines that the topic is investment research, and then proceeds to use the relevant model for that type. McMillan suspects (as do I) that there will eventually only a few foundation LLM models because of the expense and energy to train them, but many, many different fine-tuned models at different levels of granularity.

Content Quality and Organization Prevails

If you are thinking that generative AI is going to make your company’s data quality and organization irrelevant, Morgan Stanley has bad news for you. McMillan noted,

Curating 100K documents with high accuracy was not a problem for us because we have best-in-class operational controls over our content. All the boring stuff that nobody cares about is what makes this successful. We have a single repository for important content, managed by a single team. LLMs do not solve the problem of disparate data sources spread across a company.

Knowledge management may have disappeared at many companies, but it didn’t diminish at all at Morgan Stanley. GPT-4 and related technologies are a powerful integrators and describers of well-organized back-end content.

Morgan Stanley also has existing systems to distribute content to clients. Its “Next Best Action” system is one of the best in the industry for identifying (with machine learning) personalized investment ideas and distributing (with its CRM system) ideas and messages of interest to particular clients. McMillan wouldn’t speculate on how the generative AI capabilities would be integrated with those systems, but he did say that the “push” approach of Next Best Action might pair well with the “pull” approach of GPT-based answers to prompts.

Where Else Would This Work?

Morgan Stanley has over 80,000 employees and offices in more than 40 countries. It’s easy to see how the intellectual capital created across such a vast organization might sometimes be difficult to access. But with a disciplined knowledge management process and generative AI, all the knowledge from those people could be made accessible to the entire firm and its clients.

This situation does not only exist in financial services. There are plenty of other people-intensive services businesses that run on distilled intellectual capital. Consulting, audit, tax, and law firms, for example, all have the same characteristics. I predict it won’t be long before they are following the lead of Morgan Stanley.