Tuesday, February 14, 2023

ChatGPT for Industry Research: Not Ready for Prime Time

ChatGPT: Four Errors in a Single Sentence
Since its unveiling in November 2022, there has been a lot of buzz around ChatGPT—OpenAI’s generative AI chatbot. Although there are similar projects, such as Google’s Bard, ChatGPT got an enormous boost as it was free, easy to use, and could be deployed in a variety of ways—from answering questions to writing essays, poems, and song lyrics, to coming up with recipes, to writing software code, based on simple prompts from the user. 

One inevitable question of course was, to what extent would ChatGPT threaten jobs?

For me, it’s an important question. I direct a team of IT and industry research analysts at Avasant. Is ChatGPT a disruptive threat to our business? Would competitors, with less-skilled workers, be able to generate research as good or in greater quantities than what we could produce? Would clients even need research firms like ours at all, if they could generate their own reports, using these tools? Or, can we leverage a tool like ChatGPT to make us more productive? 

Putting ChatGPT to the Test

To find out, I did a series of tests to see if ChatGPT could write a short research note. 

One of my tests was to ask ChatGPT to write about something I am intimately familiar with: The history of the IT research firm Computer Economics, which my partner and I acquired in 2005 and I sold to Avasant in 2020. (I still manage this business as part of Avasant Research.) 

Here is the transcript of my prompts and the responses from ChatGPT, lightly edited for length, along with my evaluation of ChatGPT. 

Prompt: “Provide a history of the IT research firm Computer Economics.”

ChatGPT responded: 

Computer Economics is a research and consulting firm specializing in the field of information technology (IT). The company was founded in 1982 in Newport Beach, California, by John L_________, who currently serves as the firm's President and CEO. 

There are four factual errors in the second sentence of this opening paragraph! Computer Economics was founded in 1979, not 1982, as documented in this Wikipedia entry. (Wikipedia is a primary source for ChatGPT's training data, so how did it miss this?) Moreover, the firm was founded in New York City, not Newport Beach, by Bruno Bassi, not John L., and John L. was never the company president. He was our VP of Research, from 2007 until he retired in 2017. ChatGPT just made this all up. 

But wait, there's more! A couple of paragraphs later, ChatGPT writes: 

Computer Economics … has received several awards for its work, including recognition from Forrester Research as a “Leading IT Research and Advisory Firm” and from Gartner as a “Top IT Research and Consulting Firm.” 

But neither Forrester nor Gartner has any such award program. In fact, it would be hilarious if they did since they are both IT research and consulting/advisory firms themselves! ChatGPT just made up that part. So, I gave ChatGPT another prompt and a chance to correct itself. 

Prompt: “Please provide more information about Gartner's award program for Top IT Research and Consulting firms. When was this program established and who were some of the most recent winners of this award?” 

ChatGPT responded (click to enlarge): 

ChatGPT Hallucinations
ChatGPT Hallucinations

Apparently, ChatGPT is not aware of the First Law of Holes: When you find yourself in one, stop digging. 

My prompt asked who some recent award winners were. Now it says the winners are not publicly available. What kind of award keeps the winners secret? Moreover, if the winners are secret, how does it know Computer Economics was one of them? At the same time, the winners must not be secret, because they “can be found in Gartner’s annual report on the market for IT research and consulting services” (which, of course, does not exist).

Risks in the Use of ChatGPT for Research

In summary, here are some observations on the risks of using ChatGPT as a virtual research analyst.  

  1. Fiction parading as fact. As shown above, ChatGPT is prone to simply make up stuff. When it does, it declares it with confidence—what some have called hallucinations. Whatever savings a research firm might gain in analyst productivity it might lose in fact-checking since you can’t trust anything it says. If ChatGPT says the sun rises in the east, you might want to go outside tomorrow morning to double-check it.  
  2. Lack of citations. Fiction parading as fact might not be so bad if ChatGPT would cite its sources, but it refuses to say where it got its information, even when asked to do so. In AI terms, it violates the four principles of explainability
  3. Risk of plagiarism. Lack of citations means you can never be sure if ChatGPT is committing plagiarism. It never uses direct quotes, so it most likely is paraphrasing from one or multiple sources. But this can be difficult to spot. More concerning, it might be copying an original idea or insight from some other author, opening the door to the misappropriation of copyrighted material. 

Possible Limited Uses for ChatGPT

We are still in the early days of generative AI, and it will no doubt get better in the coming years. So, perhaps there may be some limited uses for ChatGPT in writing research. Here are two ideas. 

The first use might be simply to help overcome writer’s block. We all know what it’s like to start with a blank sheet of paper. ChatGPT might be able to offer a starting point for a blog post or research note, especially for the introduction, which the analyst could then refine. 

An additional use case might be to use ChatGPT to help come up with a structure for a research note. To test this, I thought about writing a blog post on the recent layoffs in the tech industry. I had some ideas on what to write but wanted to see if ChatGPT could come up with a coherent structure. So, I gave it a list of tech companies that had recently announced layoffs. Then I gave it some additional prompts: 

  • What do these companies have in common? Or are the reasons for the layoffs different for some of them? 
  • As a counterpoint, include some examples of tech companies that are hiring.
  • Talk about how these layoffs go against the concept of a company being a family. Families do not lay off family members when times are tight. 
  • Point out that many employees in the tech industry have never experienced a downturn and this is something that they are not used to dealing with.

The result was not bad. With a little editing, rearranging, and rewriting it could make a passable piece of news analysis. As noted earlier, however, the results would need to be carefully fact-checked, and citations might need to be added. 

One word of warning, however: In order to learn, young writers need to struggle a little, whether it is by having to stare at a blank sheet of paper or constructing a narrative. I am concerned that the overuse of tools like ChatGPT could deny junior analysts the experience they need to learn to write and think for themselves. 

The larger lesson here is that you can’t just ask ChatGPT to come up with a research note on its own. You must have an idea and a point of view and give ChatGPT something to work with. In other words, treat ChatGPT as a research assistant. You still need to be the analyst, and you need to make the work product your own. 

I will be experimenting more with ChatGPT in the near future. Hopefully, improvements in the tool will mitigate the problems and risks.


Update Feb. 20, 2023: Jon Reed has posted two lengthy comments on this post with good feedback. Check them out below in the comments section. 

7 comments:

Onethread Solution said...

I came across your recent blog post about ChatGPT and its capabilities for industry research. As an AI language model myself, I appreciate your insights and observations on the limitations of AI models like ChatGPT when it comes to industry research.
I agree with your assessment that while AI models like ChatGPT can be useful for providing information and answering questions, they may not have the depth of understanding and context required for more in-depth research. As you noted, ChatGPT is limited by its knowledge cutoff and may not be able to provide insights beyond that cutoff point.
I also appreciate your points about the importance of human intuition and creativity in industry research. While AI models can provide data and information, it is up to human researchers to make sense of that data and to provide valuable insights that can drive decision-making. As you noted, AI models like ChatGPT may be helpful in providing a starting point for research, but they should not be relied upon exclusively.
Overall, I think your blog post provides a valuable perspective on the capabilities and limitations of AI models like ChatGPT in the context of industry research. Thank you for sharing your thoughts and insights with your readers!
Learn More: https://pmoglobalinstitute.org/agile-pmo/

Jon Reed said...

Agree with a lot of this post. However, we need to be careful with this type of statement: "We are still in the early days of generative AI, and it will no doubt get better in the coming years."

Yes, generative AI will improve, but the question is: how much? I'm not convinced that big data + deep learning alone will ever really overcome the type of shortcomings you've exposed here. We're not in the earliest days really, and ChatGPT has been trained on the entire Internet. The knee jerk response by advocates of these Large Language Model systems is: "just feed them more data and they'll get better/smarter." Or, in the case of ChatGPT, also put in "guardrails" to try to control their worst tendencies, but it doesn't really work because this type of approach to AI doesn't truly understand the words it is spitting out.

I personally believe this approach to AI is going to hit a wall rather than improve dramatically. You can see it in self-driving where all the promises around what can be achieved have had to be pushed back, and back. That's because the training data, however vast, cannot anticipate unpredictable scenarios. Controlled highway driving, yes. Urban landscapes with so many variables the training data can't encompass, no.

I think a better approach, rather than thinking generative AI will improve significantliy, is to come up with scenarios where some level of inherent inaccuracy, false positives, and mistatements is acceptable - which is kind of what you've done in this post. Generally this means rather than, say, publish AI-generated "research," a human would be involved in the final supervision and output. 97 percent accuracy, for example, can be tolerable in many situations. In medicine, self-driving, and, as you note, research output, that is probably not tolerable.

The question then becomes: how good a business model is it (how much ROI, how much time saved), when a human must be in the loop anyhow. Example: it might be fun to have generative AI generate some research paragraphs from sources, but if I have to review it and fix it and fact check it, how much time did I save?

I had one analyst tell me they like the way ChatGPT can be fed a few sources and asked to synthesize the data and conclusions. If that's helpful, then maybe that fits into your use cases. End, part one of comment.

Jon Reed said...

Continuing with part two:

Here's where I think it gets interesting: while consumer search like Bing is a terrible use case due to the vast/polluted data sets (ChatGPT has surely ingested the cess pool discussions of Reddit for example), how about enterprise search, where there is often no convenient way to easily search, where data sets might be more confined? Granted, the tech might need some tweaking to keep feeding in up to date data and results, but that's interesting. Or, if you opt in, what if such a system could ingest your personal (or team's) data sets, and you could run searches or pull project timeline discussions from such data etc.

The way forward for AI, in my view, is to combine Large Language Models with other languishing or back burnered approaches to AI that attempt to mimic how humans reason. Example: Symbolic AI "is an approach that trains Artificial Intelligence (AI) the same way human brain learns. It learns to understand the world by forming internal symbolic representations of its “world”. Symbols play a vital role in the human thought and reasoning process." Neurosymbolic AI could be the real breakthrough that would allow the strengths of two different strands of AI to be combined. But for that to happen, all the hype mongers for LLM and generative AI would have to admit its inherent shortcomings, not assure us that it will just get better and better and better.

But in the meantime, in addition to good posts like yours which look at the use cases with a clear-eyed view, I think it's interesting to think not in terms of just ingesting huge gulps of the same type of data, what if specific data sets that are not part of the training data could be incorporated - data sets that were relatively clean and/or trusted? That, to me, is where the most interesting enterprise use cases could emerge, given the technical limitations of generative AI currently.

Frank Scavo said...

Jon, you did a good job summarizing my point of view on ChatGPT, and by extension, probably all generative AI programs. My line that "generative AI will no doubt get better in the coming years" is a throwaway line. I would have been better off not to have included that (but I'll leave it so your post and my reply make sense). It is not at all certain that generative AI will get better, for the reasons you listed. It certainly doesn't need more data. As you point out, it already has, essentially, the entire Internet.

And on that point, something I've been concerned about, and you and I discussed this offline, is that if people start posting ChatGPT generated articles on the Internet, ChatGPT will most likely include those in future iterations. Meaning, the errors that ChatGPT introduces will now be part of the corpus (body of knowledge) that it uses for training in the future. Using the example in my post above, the founder and year of founding of Computer Economics will now be locked in forever as the "truth." This is not just fake news, it is fake facts.

I don't know what the answer is. Symbolic AI could be part of the answer, as you mention. I don't know enough about it to say.

Jon Reed said...

Great reply, you nailed it. I certainly don't have all the answers.

But to your line:

"I don't know what the answer is. Symbolic AI could be part of the answer, as you mention. I don't know enough about it to say."

here's my current stance:

1. We need a clear-eyed view of the pros and cons of the existing abilities and limitations of generative AI, not the hype and drama filled view the bulk of tech media is unfortunately helping to cultivate. This will help us narrow down the use cases and not be fooled into thinking things like consumer search are a good use case (even Microsoft has backed off this considerably in just a week, I covered this in my latest hits and misses column on diginomica, for the week of Feb 20). Blogs like yours are very helpful in this effort.

2. I believe some playful curiosity is also in order, as well as use case exploration in and out of the enterprise, but only with the understanding that "having fun" and generating authoritative-sounding but flawed content isn't the same as a business model.

3. I believe that we need to pursue strains of AI that address the fundamental defect in Large Language Model/deep learning AI - which is the lack of understanding and context of what the model is spitting out. Symbolic AI is one option, but in general helping machines to "understand" relationship between objects is really important to overcoming these issues. I would just hope people understand this is not the only approach to AI that's possible. When we talk about the limitations of machines and bots we are referring to one particular strand of AI that has been turbo-boosted by huge data sets and processing power, but its limitations are going to get exposed also.





Unknown said...

Over the 25 some odd years I have known you, you continue to excel in your ability to understand, analyze and intelligently comment on the various aspects of enterprise applications. Thank you again for another great article, I have forwarded it to many of my colleagues. Keep up the good work. Best Regards Don Lindsey

Matthew King said...

Search engines have diminished the usefulness of people who have excellent text and object recognition. I smile when I see my local general practitioner consult doctor google on his PC.

ChatGPT takes that concept further. The search result is essentially replaced by a summary of findings, which the reviewer needs to verify for accuracy against trusted sources, adding in footnotes (references) along the way. It’s a lazy person’s way to write a literature review, which suits me just fine. But I don’t see how this is going to set the world on fire when there are already thousands of quality documents already written, that are easily discoverable via search engines. And like you say, ChatGPT output and input will begin to feed on itself, making original content harder and harder to find for verification purposes.

What we need is more original content, generated by things that can think for themselves all of the time. Things that can think of something new rather than synthesising a multitude of content about things already known.