Office of the Superintendent of Financial Institutions
Artificial Intelligence (AI) is dominating tech business headlines, and much of the buzz seems to centre around who has the biggest, baddest, smartest, most game-changing model.
The problem with fixating on which model is the most complex and technically impressive, however, is that it overshadows a far more important factor: data quality.
Is the data these models are using accurate? Is it reliable? Where does it come from? What form does it take?
You can have the nicest Ferrari on the road, but if you fill it with the wrong fuel, it won’t get you anywhere.
Worse, it could wind up costing you a lot of money to fix.
Data use in AI was one of the topics an illustrious group of experts from the financial services industry, government bodies, and academia discussed at the Financial Industry Forum on Artificial Intelligence (FIFAI) workshops.
The conversations, and the FIFAI report that followed, touched on four main principles guiding the use and regulation of AI in the financial industry:
In this series of articles – which began with
Explainability and continues here with Data – we are examining each of the themes in detail to see what we can learn and how we can apply this knowledge to regulatory research and activities.
Please note that the content of this article and the AI report reflects views and insights from FIFAI speakers and participants. It should not be taken to represent the views of the organizations to which participants and speakers belong, including FIFAI organizers, the Office of the Superintendent of Financial Institutions (OSFI) and the Global Risk Institute (GRI).
In addition, the content of the article and report should not be interpreted as guidance from OSFI or any other regulatory authorities, currently or in the future.
The old computer programming adage of “Garbage In, Garbage Out” can apply here.
That is, the quality of the output depends on the quality of the input.
“Everyone talks about models, models, models,” Ima Okonny, Chief Data Officer at Employment and Social Development Canada, is quoted as saying in the FIFAI report. “We need to focus on proper data analysis first … shift [the] mindset to [data] stewardship.”
Forum participants focused on four key questions pertaining to data factors in AI:
Data used for AI training and development exhibit several characteristics which, when leveraged by AI, present a wide range of possibilities. These characteristics, such as volume, variety, and agility, can make it more challenging for financial institutions to integrate and standardize data as well as manage data risk.
Data quality is of particular importance.
Maintaining high data quality is becoming increasingly difficult, with many hurdles to overcome.
The FIFAI report notes five, though this is not an exhaustive list:
So, how to manage all this data risk?
Research in this area is still evolving, however, data scientists and engineers have leveraged many approaches, including continual exploration, cleaning, validation, and integration of data assets.
Still, even with these tactics it can be difficult to assure sound data quality.
Another effective way to enhance the quality of data used in AI: strong governance.
As the FIFAI report explains: “Good data governance can help ensure that data is accurate, consistent, safe, and complete, which is crucial for the effective functioning of AI systems. Data governance is critical for financial institutions considering the sensitive and confidential nature of financial and customer data.”
What does that mean in practice?
Forum participants discussed a couple of ways businesses can improve their data and, by extension, their AI models.
The first is to take a data-centric approach to model building.
“One way to improve the performance of both AI-applications and traditional models is to continuously improve the data used to train those models, also known as a data-centric approach,” the report explains.
“Rather than solely focusing on algorithm iteration and retraining to improve performance, incorporating a data-centric approach maximizes the performance potential of the model. Sound data governance is necessary to adopt a data-centric approach to AI model development.”
Another is to build a strong data literacy culture.
“Organization-wide awareness of the various risks that stem from inadequate use of data is essential with widespread adoption of AI, thus, organizations should consider ongoing training activities for their employees on a broad range of aspects related to data,” the report says.
To explore even more AI data factors or any of the other themes forum participants discussed, read the full FIFAI report (PDF, 5.42 MB).