I’ve suggested to several of my accounting and finance peers in recent weeks to maybe pump the brakes on the victory celebration following our recent win over the robots.

When ChatGPT floundered on a test designed for accountants, skeptics rejoiced, declaring that the AI takeover had been stymied. But in the shadows of this fleeting victory, there is a clear path to developing a tool that could forever change the nature of our work.

A recent paper published by researchers at BYU (in partnership with 186 other universities) asked ChatGPT more than 20,000 accounting questions and compared the large language model’s results to those of accounting students. The paper, The ChatGPT Artificial Intelligence Chatbot: How Well Does It Answer Accounting Assessment Questions?, by Wood et al. in Issues in Accounting Education, evaluated the performance of ChatGPT in answering accounting assessment questions, and found that human students generally outperformed the bot. ChatGPT correctly answered 47.4% of the questions, compared to 76.7% of human respondents. ChatGPT was particularly bad at math questions compared to human exam takers.

There are many reasons ChatGPT underperformed on an accounting exam, but chief among them is that it was the wrong tool for the job. ChatGPT is a large language model (LLM) that was designed to understand and generate human-like text by (in short) predicting the most likely next word based on billions of documents on which it was trained.

LLMs are great at answering questions, interacting with humans, generating content, and even doing more complex tasks like brainstorming and problem-solving. But they’re not great at understanding complex or domain-specific topics (like accounting, law, or medicine), performing complex math problems, or dealing with ambiguity. In those fields, humans still have the upper hand … for now.

ChatGPT and most other LLMs are generalists that lack the in-depth expertise that a human expert would possess. They don’t have access to real-time information, they’re not great at context and nuance, and they aren’t great at differentiating between sources. This is to say that LLMs are kind of like your weird uncle who gets all of his news from Facebook and some weird Geo Cities website that hasn’t been updated since 2002. To an LLM, text from a Reddit or Quora response has no more or less credibility than data obtained from a Wall Street Journal or NY Times article.

But this is far from the end of the road for generative AI and its implications to finance and accounting.

Rather, it is the foundation on which many new tools will be built. Specially trained tools that know the ins and outs of ASC, IASB, and PCAOB are but a training generation away. And we are but a public release away from receiving ChatGPT plug-ins that solve many of the shortcomings of the current model (e.g. math and memory).

These AI upgrades aren’t dependent on another radical shift in the base technology. These are just minor, incremental changes to an already awesomely powerful model. And many of them are already being tested at this very moment.

Despite the poor performance of the out-of-the-box chatbot on this comprehensive accounting exam, it is not difficult to see how a fine-tuned model could far exceed ChatGPT’s basic accounting skills. While the potential of AI to replace human workers has never been more clear, there are a number of steps that need to be taken in order for an AI-powered assistant to outperform finance and accounting experts.

By leveraging fine-tuned LLMs and intelligent agents that can read and analyze specific documents, and spreadsheets, and even outsource data analysis tasks to other platforms, domain-specific chatbots could rewrite the narrative on AI’s role in the accounting profession. So, hold off on the victory dance, because the AI revolution in accounting has only just begun.

Scope

Rather than throwing a standard LLM at a domain-specific challenge and expecting it to be able to perform better than trained and credentialed experts, in order to solve this challenge, we should ask ourselves, what skills would an AI-powered finance and accounting expert need to do.

Mission: To create an always-on AI assistant that could answer finance and accounting questions, analyze financial statements, ratios, and other KPIs, and offer insights and recommendations for decision-making.

This tool should be able to generate financial statements and help navigate complex regulatory environments, providing guidance on compliance with financial reporting standards, auditing requirements, and tax laws.
It should be able to help with budgeting and forecasting, fraud detection, risk management, and tax planning and filing.
Powered by state-of-the-art machine learning algorithms and data, it would be able to analyze vast amounts of transactional data in real-time to identify potential fraud, financial risks, and compliance issues, which would help mitigate risks more quickly and effectively than human teams.
A properly trained AI assistant would offer speed and efficiency, generating large volumes of data quickly and accurately. This would enable faster financial analysis, report generation, and decision-making.
The tool should be scalable to meet expanding needs of a growing business.
It should be further trained on and able to interact with internal company data, offering customized recommendations based on specific financial needs, goals, and risk tolerance.

Fine Tuning a Model for Finance & Accounting

While context windows are expanding rapidly, and a recent paper has already outlined how a transformer’s memory could be extended to include up to a million tokens, fine-tuning is still a crucial step in building a reliable accounting and finance expert. (The internet at large—in case you haven’t seen it – is full of contradictory and confusing information.) To increase a model’s trustworthiness, it would be important to control the data on which it was trained.

With a large and diverse enough dataset (e.g. SEC filings, GAAP rules, IFRS, FINRA, etc.), you could fine-tune the model to become an expert in finance and accounting, capable of handling tasks such as financial statement analysis, risk assessment, forecasting, and compliance with accounting standards.

The fine-tuning set would include:

GAAP (Generally Accepted Accounting Principles) documentation, including Financial Accounting Standards Board (FASB) Accounting Standards Codification (ASC) and related guidance documents;
International Financial Reporting Standards (IFRS): IFRS Standards, IAS Standards, IFRIC Interpretations, and SIC Interpretations;
Financial Industry Regulatory Authority (FINRA) Rules, Notices to Members, Regulatory Notices, and Interpretive Guidance;
Securities and Exchange Commission (SEC) regulations, including the Securities Act of 1933, Securities Exchange Act of 1934, Investment Company Act of 1940, Investment Advisers Act of 1940, Sarbanes-Oxley Act of 2002, Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010, SEC Regulations (Regulation S-X, S-K, etc.), and SEC Staff Accounting Bulletins;
Public Company Accounting Oversight Board (PCAOB) standards, which govern the audits of public companies in the United States;
AICPA (American Institute of Certified Public Accountants) guidance, such as Statements on Auditing Standards (SAS), Statements on Standards for Attestation Engagements (SSAE), and Statements on Standards for Accounting and Review Services (SSARS);
Industry-specific accounting guidance, such as guidance for banking, insurance, real estate, and other specialized industries;
Country-specific accounting standards and regulations for jurisdictions outside the United States, such as the UK’s FRC (Financial Reporting Council) standards or Canada’s ASPE (Accounting Standards for Private Enterprises);
Certification exams like those for Chartered Public Accountants (CPAs), Chartered Financial Analysts (CFAs), Chartered Global Management Accountants (CGMAs), etc.;
Public SEC filings, including public companies’ quarterly and annual reports, mutual funds and other financial reports;
Public websites that answer finance and accounting questions like Investopedia, Corporate Finance Institute, Forbes, Bloomberg, etc.

Trust in the model’s output could be greatly increased by fine-tuning it on the very rules and regulations the model needed to comprehend.

But even a fine-tuned language model is still just a language model. Massive models like GPT have picked up some basic math in the course of their training, but when they do get a math problem correct, they are just spitting out answers from rote memorization without any understanding or context in the answers. Again, that’s because these models weren’t designed to do the math.

Some other reasons LLMs aren’t good at math:

Sequential Processing: LLMs are “feed-forward” models, which means they process text in sequential order, which is not always ideal for math problems, which often require a hierarchical understanding of expressions, such as solving equations or performing calculus.
Limited Working Memory: LLMs have a limited context window, which restricts their ability to remember and manipulate large amounts of information simultaneously. This limitation can make it challenging for LLMs to handle complex calculations, especially those that involve multiple steps or require holding on to intermediate results.
Model Design: The T in GPT refers to the model’s transformer architecture. These transformers are not specifically optimized for mathematical problem-solving.
Error Correction: LLMs do not have built-in error-correction mechanisms, which are essential for precise mathematical computations. As a result, they may generate incorrect results or fail to recognize when a given solution is incorrect.

But this isn’t the end of the world when it comes to building a powerful finance and accounting machine. It just means we need to incorporate other tools into the application – tools that were designed to handle the bits that LLMs can’t.

Outsourcing Tasks Through the Power of Agents

At times, LLMs can seem almost omniscient in the wide range of tasks they are able to perform. But they are not the right tool for every job. Asking an LLM to provide guidance on corporate tax filings is akin to asking a poet to solve a Diophantine equation. Nothing against poets, but they are the wrong resource for the job.

But because LLMs are so great at generating human-like text, they can be directed to serve as something of a generic interface that links other AI models. Think of a properly prompted LLM as a router, directing traffic to the appropriate tool.

The trick is to find a way to get LLMs to interact with publicly available tools like Wolfram Alpha, which answers factual queries by computing answers from externally sourced data; repositories like Hugging Face, which is a hub for myriad machine learning algorithms; or with proprietary or private applications or databases.

Several open-source applications are already working to extend LLMs’ functionality by motivating prompts through rules established in computer code, which could make it easier for LLMs to perform more complex, multi-step tasks with less human prompting.

GPT on Steroids

AutoGPT is an experimental open-source tool that aspires to make GPT fully autonomous by creating AI agents that interpret human prompts and iterate through them until a task is complete. Similarly, BabyAGI uses AI agents to create tasks based on the result of previous tasks and a predefined objective. Another new but potentially powerful tool is HuggingGPT, which connects LLMs with the ML community. (These applications also possess a long-term and short-term memory system that enables them to store and recall information as needed to complete assigned tasks.)

They do this by creating and deploying AI agents to work behind the scenes to complete complex tasks and extend LLMs’ range beyond the chat window.

“Let every eye negotiate for itself and trust no agent.”

Save maybe for these.

AI Agents are computer programs that can operate autonomously (or in concert with other agents through natural language). Built on the backs of LLMs, these agents should be able to perceive dynamic environmental conditions (e.g. changing human prompts and their subsequent results), act to affect those conditions, then use reason to interpret the outcomes and solve problems by drawing inferences to determine their next actions and potential outcomes.

(Probably important here to note that humans are also agents.)

That sounds like a tall order for a bot, right?

But this is where the power of LLMs can be harnessed to great results. Agents use the LLM itself to break down a user prompt into discrete steps and then carry them out. But some level of guard rails has to be put in place to keep the agents from going completely off the track.

AutoGPT, for example, breaks down the agents’ activities into three categories: thoughts, reasoning, and criticism. When a user enters a prompt, the agent’s understanding of that prompt is shown in thoughts. The reasoning phase is where the agent determines what to do based on the prompt. Exposing this reasoning reduces the “black box” element of what’s going on under the hood, which could enable better tuning of the agents’ functions and outputs. Finally, the criticism phase asks the AI to be somewhat self-aware and identify problems with its reasoning.

With an understanding of the role and capabilities of AI agents, it is easy to imagine an environment where an LLM could receive a prompt that is outside of its abilities and rather than trying to answer the question itself, it routes or outsources the question to a tool that is designed to answer that type of question.

In the context of creating a trustworthy tool that could answer important finance and accounting questions, it would be important to limit the range in which it could operate. For example, I wouldn’t want my AI accounting expert to scour Reddit for an answer to an important question on accounting guidance.

This brings us back to the importance of fine-tuning our model on accounting-specific data. It is key to remember, however, that these models are not trained on live data, and if accounting rules and regulations have changed (e.g. lease treatment) since the time the model was trained, the tool wouldn’t inherently have the correct answer. Perhaps, one of the safety measures for our AI agent then could be to check for updated guidance.

Plug In & Hang On

Using specialized AI agents to route prompts to the appropriate resource would solve one of the greatest limitations of LLMs by not asking them to perform tasks for which they were not designed. But with the pending release of ChatGPT plug-ins or direct links to other applications, OpenAI may well have solved 80% of the problem on its own.

While LLMs still suffer from their terrible memory issues (i.e. limited context window), these ChatGPT plugins will enable ChatGPT to leave its isolated bubble from within the application itself.

In simplest terms, think of these plug-ins as “apps” for ChatGPT. With the pending public release of ChatGPT integrations with popular web applications like Expedia, Instacart, OpenTable, Wolfram, and Zapier, ChatGPT will be able to directly access these tools and perform functions on our behalf.

Further, ChatGPT is poised to release its Code Interpreter (CI), which will incorporate the power of computer coding within the model’s output. One powerful use case for the CI recently demoed by Greg Brockman in a TED talk involved uploading a CSV file directly into the prompt window of ChatGPT and asking the application to provide information about the file. Behind the scenes, ChatGPT writes Python code at lightning speed and returns exactly the type of information financial analysts would want to see as they analyze company financial statements and other numerical data.

As impressive as this demonstration is – and it is VERY impressive, the limitation is still that whatever knowledge ChatGPT and the user glean from the exercise, it is limited to a singular chat session. It is valuable for analysis, but it lacks the sticking power one would expect from a truly valuable finance and accounting team member.

For the output to be truly valuable, we would need to reroute the output to a data repository outside of the ChatGPT application.

This leads to the final component of the AI-Powered Finance & Accounting Assistant:

Interacting with External Databases

Management dashboards have been around for years. They provide valuable, at-a-glance financial and performance data to management teams, enabling them to make better-informed, data-driven decisions.

But dashboards are limited in that the information they provide is generally pre-defined and not set up for deep dive, one-off custom reporting.

By incorporating the power of an LLM into a dashboard and giving it access to full historical information, the power of these applications would be greatly extended.

Here we see the potential of incorporating LLMs’ greatest strength into our everyday workflows and decision making. The good news is that even before LLMs’ context window is expanded to a meaningful number, by allowing it to interact with systems that possess this long-term memory, we can finally begin to capitalize in a meaningful way on these models’ ability to interact with humans in a language we understand.

In order to understand how LLMs interact with the world, we have to understand the language they speak.

How AI Models Understand Text

At their core, computers process information as numbers. To enable seamless communication between humans and AI models, text must be converted into numerical representations.

Recently developed tools like LangChain, which can string together a series of prompts to achieve a desired outcome, allow LLMs to communicate with documents outside of the models themselves.

This is important because practical applications of LLMs in unique business settings will require these tools be able to interact with targeted, specific data. Further, LangChain (and similar functions) is the backbone that enables not only the creation of AI agents, but also for LLMs to interact with other documents, and to provide memory and context.

Putting it All Together

In order to build a trustworthy, reliable, and helpful AI Powered Finance & Accounting Assistant, we have to think of LLMs as foundational models, upon which we can build highly specialized applications, not only for finance and accounting, but for other domain-specific, enterprise-ready applications. The steps to build a truly powerful domain expert, we need to:

Fine tune a model on domain specific information.
Use AI agents to outsource tasks that LLMs were not designed to handle.
Capitalize on the power of built-in applications or plug-ins to supercharge the output of user-generated queries or prompts.
Enable the fine-tuned LLM to interact with text-based and numerical data stored on external databases.
Route output to a system that possesses better long-term memory.

So while today’s ChatGPT may have come up short on an accounting exam, this should not be seen as a verdict on the future of AI in finance and accounting. Instead, it highlights the need for further specialization and adaptation of these models to create domain-specific, enterprise-ready applications. By fine-tuning LLMs with domain expertise, leveraging AI agents to outsource tasks, capitalizing on the power of built-in applications, and enabling seamless interaction with external databases and long-term memory systems, we can transform the role of AI in finance and accounting.

The potential of AI-powered finance and accounting assistants is immense, and with the right approach, these innovative chatbots can become indispensable tools for professionals in the field. As we continue to push the boundaries of AI technology, we must keep an open mind about its capabilities and applications, avoiding premature celebrations or dismissals. The AI revolution in finance and accounting is far from over; in fact, it has only just begun. So, rather than fearing the unknown, let’s embrace the possibilities and work together to shape a future where AI and human expertise coexist, complementing each other to create a more efficient, accurate, and prosperous world of finance.

Author

Glenn Hopper

Glenn Hopper is a director at Eventus Advisory Group and the author of Deep Finance: Corporate Finance in the Information Age. He has spent the past two decades helping startups transition to going concerns, operate at scale, and prepare for funding and/or acquisition. He is passionate about transforming the role of chief financial officer from historical reporter to forward-looking strategist.
View all posts

Glenn Hopper May 4, 2023

13 minutes read

Scope

Fine Tuning a Model for Finance & Accounting

Outsourcing Tasks Through the Power of Agents

GPT on Steroids

“Let every eye negotiate for itself and trust no agent.”

Plug In & Hang On

Interacting with External Databases

How AI Models Understand Text

Putting it All Together

Author

Related Articles

Tariffs Are Here and AI Will Decide Who Survives

AI in Finance: The Role of Intelligent Automation in Expense and Invoice Management

AI spurs a revolution in banking, investments, and financial Inclusion

Who Keeps Your Digital Payments Alive When the System Could Fail?