As we predicted, 2019 has become the year of data for the financial services industry. As technology and trends such as artificial intelligence, the Internet of things and open banking begin to accelerate the pace of change globally, data complexity increases accordingly. Now is the time for banks to get their data houses in order. Data, after all, is the foundation which will determine which banks excel based on service, relationships, efficiency, and effectiveness, and which banks fail.
At Gradient Ascent, we deal with many clients in the financial services industry and we can’t help but notice that they face some common challenges and tough questions as they begin to launch their data-driven projects.
Three of the key challenges we see are centred on:
- Data integrity: Companies in financial services either do not have schemas or the schemas they do have are simply out of date. The database schema can be thought of as the blueprint of your database and the rules it follows, and it is often presented as a visual representation. Yet, too often the schema and the data it contains is no longer valid, hence the data lacks integrity.
- Data completeness: Although schemas can help define what data should be found within each feature of a data-driven project, we often see companies fail to properly collect data in the first place. This leads to missing data or data gaps, and that’s a problem. If you’re trying to develop and implement a machine learning algorithm, a feature that is missing 20% data will render the exercise useless.
- Data consistency: This is a big one. Datasets that span multiple years, systems, products, lines of business — as they often do — are very susceptible to schema changes. As a result, new features introduced in year three of data collection, for example, will not be available in years one and two.
This also relates to data descriptions. These can change from year to year, and to establish data consistency, they need to be standardized and reviewed across all years. Further, we see consistency issues when data is aggregated from multiple sources. Different banking institutions capture data categorizations and descriptions in completely different ways, and the collected data needs to be organized and reconciled as a result.
Also, as data grows in terms of the geographic scale it covers, being able to categorize, report, and make predictions based on that data becomes increasingly complicated. To address this, we have to look at a number of factors, including:
- How we process transactions.
- How we identify payroll transactions as opposed to deposits from a lender.
- How we address geographical scope and scale. For example, think about the grocery stores within your city. You can probably name a great deal of them. But as you expand outwards can you name all the grocery stores within your state or province? How about the states and provinces that neighbour yours? How about across the country? And what about within all the countries you operate in? As the depth of the data increases, the complexity of data increases accordingly.
- How we deal with multiple languages. We’ve worked on projects where data is aggregated across multiple institutions across varied geographic regions, and this led to multiple languages being captured in the database.
It can be daunting, yes, but as you approach your data-driven AI projects, here are some initial questions to ask:
- Do you know what data you currently have? This may require a data audit, a task that can be broken down into smaller parts to make it more manageable.
- Do you have a database schema that explains the features you are collecting?
- Do you know what values to expect within each feature? If not, how do you identify good data versus bad data?
- Have you evaluated your data based on the defined schema?
As the importance and complexity of data continue to grow, getting started on the right foot will improve data readiness and overall project success.
We will provide more on how to successfully approach the data challenges in the coming weeks. Contact me directly to learn more about our data services for the financial services industry.