Exploring the most significant hurdle to AI effectiveness

Publication type

Artificial intelligence

Date

April 30, 2023

Artificial Intelligence (AI) is dominating tech business headlines, and much of the buzz seems to centre around who has the biggest, baddest, smartest, most game-changing model.

The problem with fixating on which model is the most complex and technically impressive, however, is that it overshadows a far more important factor: data quality.

Is the data these models are using accurate? Is it reliable? Where does it come from? What form does it take?

You can have the nicest Ferrari on the road, but if you fill it with the wrong fuel, it won’t get you anywhere.

Worse, it could wind up costing you a lot of money to fix.

Data factors in regulating AI in the financial services industry

Data use in AI was one of the topics an illustrious group of experts from the financial services industry, government bodies, and academia discussed at the Financial Industry Forum on Artificial Intelligence (FIFAI) workshops.

The conversations, and the FIFAI report that followed, touched on four main principles guiding the use and regulation of AI in the financial industry:

E – Explainability
D – Data
G – Governance
E – Ethics

In this series of articles – which began with Explainability and continues here with Data – we are examining each of the themes in detail to see what we can learn and how we can apply this knowledge to regulatory research and activities.

Please note that the content of this article and the AI report reflects views and insights from FIFAI speakers and participants. It should not be taken to represent the views of the organizations to which participants and speakers belong, including FIFAI organizers, the Office of the Superintendent of Financial Institutions (OSFI) and the Global Risk Institute (GRI).

In addition, the content of the article and report should not be interpreted as guidance from OSFI or any other regulatory authorities, currently or in the future.

Garbage in, garbage out

The old computer programming adage of “Garbage In, Garbage Out” can apply here.

That is, the quality of the output depends on the quality of the input.

“Everyone talks about models, models, models,” Ima Okonny, Chief Data Officer at Employment and Social Development Canada, is quoted as saying in the FIFAI report. “We need to focus on proper data analysis first … shift [the] mindset to [data] stewardship.”

Forum participants focused on four key questions pertaining to data factors in AI:

What is the impact of a varied dataset on data quality?
What challenges has AI introduced to data governance?
How can AI-specific risks arising from third-party exposure be addressed?
What are the complexities in aligning data and business strategies?

Data used for AI training and development exhibit several characteristics which, when leveraged by AI, present a wide range of possibilities. These characteristics, such as volume, variety, and agility, can make it more challenging for financial institutions to integrate and standardize data as well as manage data risk.

Hurdles to useful data

Data quality is of particular importance.

Maintaining high data quality is becoming increasingly difficult, with many hurdles to overcome.

The FIFAI report notes five, though this is not an exhaustive list:

Inconsistency: Data can be highly inconsistent, with varying formats, structure, and levels of detail. This can make it difficult to identify patterns or trends in the data.
Noise: Typos, grammatical errors, and irrelevant information often found in data can make it difficult to extract useful insights.
Lack of context: Without proper context, understanding the meaning and significance of some data can become challenging. For instance, for a sentence in a customer feedback form "I am not happy with the service", it is not clear what service is implied.
Quality of sources: Data can come from a variety of sources, such as social media, customer feedback, and news articles, which can have varying quality and reliability.
Dual meaning: Some types of data can have vague interpretation or be understood differently depending on their business use.

So, how to manage all this data risk?

Research in this area is still evolving, however, data scientists and engineers have leveraged many approaches, including continual exploration, cleaning, validation, and integration of data assets.

Still, even with these tactics it can be difficult to assure sound data quality.

Searching for solutions

Another effective way to enhance the quality of data used in AI: strong governance.

As the FIFAI report explains: “Good data governance can help ensure that data is accurate, consistent, safe, and complete, which is crucial for the effective functioning of AI systems. Data governance is critical for financial institutions considering the sensitive and confidential nature of financial and customer data.”

What does that mean in practice?

Forum participants discussed a couple of ways businesses can improve their data and, by extension, their AI models.

The first is to take a data-centric approach to model building.

“One way to improve the performance of both AI-applications and traditional models is to continuously improve the data used to train those models, also known as a data-centric approach,” the report explains.

“Rather than solely focusing on algorithm iteration and retraining to improve performance, incorporating a data-centric approach maximizes the performance potential of the model. Sound data governance is necessary to adopt a data-centric approach to AI model development.”

Another is to build a strong data literacy culture.

“Organization-wide awareness of the various risks that stem from inadequate use of data is essential with widespread adoption of AI, thus, organizations should consider ongoing training activities for their employees on a broad range of aspects related to data,” the report says.

To explore even more AI data factors or any of the other themes forum participants discussed, read the full FIFAI report (PDF).