March 17 '25

3 Questions: To catch financial crimes and bad actors

Written By

MIT-IBM Watson AI Lab researchers are developing graph learning methods to detect monetary fraud on huge scales.

Globally, money laundering and credit card fraud costs banks hundreds of billions of dollars in losses, and the downstream impacts can result in huge regulation issues, the capacity to destabilize markets, distorted economic development, harm to consumers, and eroded trust in financial institutions. Therefore, it’s critical to identify and understand the origins, bad actors, and mechanisms of fraud in order to devise strategies to prevent these activities.

In the U.S., financial institutions have an obligation to report suspicious activity, and so have technologies and policies in place to pinpoint and disclose it to the proper authorities. For this, companies and organizations are regularly employing risk management techniques, but some are going further. Using machine-learning methods, banks are working with researchers to better pick out troublesome individuals, illicit networks, and transaction patterns in order to significantly curtail the problem.

Julian Shun — an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), the Computer Science and Artificial Intelligence Laboratory (CSAIL), and a researcher with the MIT-IBM Watson AI Lab — and Yada Zhu, a principal research scientist with IBM Research and a manager with the Lab are developing graph learning methods to predict and catch these crimes in real time.

Q: What makes tracking and tackling money laundering, credit card fraud, and managing risk difficult for banks, and where do current identification methods, particularly graph learning techniques, fall short?

Zhu: There are a huge amount of data and regulation issues, because for reputation and other kinds of reasons, the bank cannot release the data they have to outside [entities], so research from academia is challenging. Also, it’s not only an algorithm problem; there are huge computing challenges because a lot of decisions — that are needed from the bank for credit card fraud — are needed in real time to detect it.

On the other hand, you don’t want to raise a lot of false alarms because that causes a problem on the customer side. It also causes a problem on the banks’ own work burden to investigate it. So, this is a long-standing, high impact, fundamental challenge to the financial service sector.

Further, money laundering, credit card fraud, and risk management are separate tasks with different characteristics and concerns. Money laundering involves processing illicitly gained funds to make them appear legitimate, whereas credit card fraud entails unauthorized use of credit card information for financial gain. Risk management, on the other hand, encompasses a broad range of activities aimed at identifying, assessing, and mitigating financial risks, including market, credit, and operational risks.

From a financial institution’s perspective, money laundering requires identifying and stopping complex, often international schemes that disguise illegal money flows, while credit card fraud focuses on detecting and preventing unauthorized transactions and account takeovers. These specific and focused efforts contrast with risk management, which involves strategic planning, compliance with regulations, and use of various tools to manage the institution’s overall risk exposure.

Basically, banks are mainly using rule-based methods — traditional machine learning and logistical regression — those kinds of tree-based methods. But in the real world, the data is very noisy, and the fraud rate is less than 0.3%, so it’s a very rare event, so, those methods are not robust. They also have high generalization errors. Also, the traditional methods look at a static graph, but the transactions happen at every minute and second, so, the temporal feature — that’s another challenge we’re trying to address.

We are trying to tackle these problems and address their needs. From their perspective, the solution needs to be scalable; that’s first. Second, in banking, it’s a highly regulated industry, so modeling risk management — that’s super important. For all the results from the models, the banks need to understand every step; they need to be able to trace from the data in until the result out, so the model’s explainability. That’s the challenge to work with financial services that’s different to other industries.

Shun: The financial transaction networks that we’re dealing with are heterogeneous, and the nodes and the edges have different types — the nodes can represent people, or they could represent accounts. The edges could represent different types of transactions, like domestic transfer versus international transfer. The traditional graph learning and existing graph learning techniques don’t work too well in the setting where the graph data is heterogeneous, and the solutions that have been proposed have only been tested on small graphs. The graphs that we encounter in real life are much larger, containing up to billions of edges, and therefore it’s important to design high performance and scalable solutions, so that we can analyze these graphs quickly. Scalable solutions can enable banks to effectively detect fraudulent transactions in real time.

Q: How are you designing your method to handle these challenges, and why are you concentrating on subgraphs to pick out patterns of money laundering, credit card fraud, etc.?

Shun: The reason why we’re interested in subgraphs is that a lot of these money laundering and fraudulent transaction cases involve several bad actors, and it’s usually isolated to a small part of the network. One simple example of one of the subgraphs that we care about is a cycle: There’s a person who sends money out to somebody else (to some other account), and then that person sends it to somebody else, and then eventually the money makes it back to the original person. We aim to be able to process cycles of different sizes. Other patterns, such as dense subgraphs (subgraphs with a lot of edges relative to nodes) can also be used in anomaly detection.

We’re building a framework that can allow you to mix and match different components in our solution for different types of heterogeneous and heterophilic graphs. In our work, we propose using a type of graph neural network called a transformer that can capture long-range dependencies, so that nodes can actually get information from other nodes, even though they’re farther away in the graph. An important consideration for this framework is that it should be scalable to the largest graphs out there, since prior solutions on graph transformers are only scaled to tens of thousands of nodes, but real-world networks have billions of edges or more. We’re also working on reducing the computational complexity of the solutions and taking advantage of parallel machines, like GPUs, to speed up the training.

One of the ideas we’re using to scale to these large graphs is graph sampling. Instead of training on the whole graph, you take batches of your nodes and a sample of their neighbors, and then you pass this subgraph to a parallel machine like a GPU, where you can do training really fast. If you train it with enough of these different samples, in the end, you’re likely going to get good accuracy in your predictions. There are various ways for sampling, and our framework provides a few different options for doing that.

Another component of the framework is the kind of embedding you use to represent the nodes (before you do the training) i.e. a way to embed the nodes into a vector, based on the structural information of the graph, like whether your graph data is homogeneous or heterogeneous. There are embeddings based on random walks, or based on shortest path distances to other nodes in the graph. Our framework tries to provide a lot of different options, because we don’t know which is going to work best for the particular workload, and we want to make the framework flexible enough so that the user can choose the method that suits their workflow the best.

Another component is the attention calculation; this is used to capture the long-range dependencies in the graph. There are different options based on how far out (i.e. how many hops away) you want to go and gather information from your neighbors. We are integrating existing techniques for doing this.

Recently, we incorporated a neural architecture search inside our framework to enable the framework to automatically find a good configuration; a user can also directly choose a configuration.

Our framework is the first to put all of these pieces together and make it easy for people to combine different options, making it suitable for different downstream tasks that banks want to solve.

Q: How are you ensuring your technique works reliably and at scale?

Shun: We’re testing on both homogenous and heterogeneous datasets. So far, we’ve been focusing on research citation networks, where the graph has nodes that are authors, and they’re connected to each other based on their collaboration. Here, we are comparing our method to state-of-the-art graph transformer models, and we’re able to improve classification accuracy for the citation network. We found that accounting for data heterogeneity is usually pretty important for improving accuracy.

Zhu: We have validated our method in terms of robustness and then for fraud detection accuracy. We’re also using large volumes of synthetic financial fraud data to test and scale the method. They have generated different size networks, as large as 5 million nodes and 200 million edges, as well as different fractions of fraudulent cases within those data, ranging from 0.05% to 0.12%. For a medium size transaction network, we find that our method is 3 times faster in terms of transactions per second and per batch latency. The work with Wells Fargo is on a much smaller dataset (on the order of tens of thousands of addresses) but working with the large synthetic dataset is helping us to see if we can scale our method to a larger network because banks need a millisecond response to predictions, in some cases. Wells Fargo helped provide feedback on the results and our method. We are also working to help to explain the algorithm to the financial service — the explainability, that’s very important for Wells Fargo’s model risk management, and that eventually impacts their deployment of the method.

Shun: One of the directions we’re looking at is further improving the scalability of our solution, so that as transactions come in, they can be processed in real time. The attention calculation part of the framework does a lot of linear algebra computations, which are usually quite expensive. We’re trying to speed up these linear algebra computations by doing sampling inside the computation.

Then the other direction is to apply our framework to do a more comprehensive analysis on the financial transaction networks and see if our current solution is sufficient for the fraud detection task, or, if we need to add additional components to our framework to deal with these specific types of networks.

Ideally, our model can be updated in real time as the transactions come in, to not just serve queries in real time, but also be updated to improve future queries.