This article is the first of two in a blog series, that describes the journey of data science interns, Dieudonné Niyitanga and Samuel Mutinda during their placement at a large bank in Rwanda. This article describes two of the data solutions they worked on: an analysis of churn among bank customers and a sentiment analysis using external social media data.

The churn challenge

For any bank, customer satisfaction and retention are important. Customer sign-up represents a significant cost for a bank; and customers who churn (meaning they stop using their bank products within a specific period) mean the bank doesn’t earn sufficient income to cover those upfront costs. Understanding patterns in churn behaviour can allow the bank to identify areas of product and service dissatisfaction and can allow the bank to better forecast its operations and profits. Understanding this behaviour is relevant for the activities of multiple departments, ranging from marketing and customer service to the various product departments.

The bank therefore identified the need to leverage existing data to perform churn analysis, to determine the common behaviours of customers who stop using their banking products. This then allows various departments in the bank to pre-emptively target customers who exhibit these behaviours so as to better serve and retain them. Ultimately, this will result in increased business value for the bank and better value financial services for its customers.

Customer data

For this data use case, we used internal data assets already collected and owned by the bank between 2014 and 2018. This consisted firstly of static customer demographics data and secondly of dynamic transaction history data. The variables in the customer demographics data – stored in the bank’s customer relationship management (CRM) database – included information such as sex, age, type of customer (such as individual versus corporate), number and type of products, and date of first becoming a bank customer. 

While a simple extraction of the different variables could provide a static table of the percentage of churned versus loyal customers per customer characteristic, the combination of these variables and data models allowed us to find deeper insights. For example, key predictors of customer churn were found in specific combinations of products, in the length of contracts and in customer age. On the other hand, variables that we expected to be predictors of customer churn turned out to be significantly less relevant.

These insights allowed specific bank departments to devise targeted strategies to ensure customer retention. Some departments could work with other product departments to cross-sell the right combination of products. Having identified a period after which customers are likely to leave, sales teams can reach out to them pre-emptively. Moreover, the data provided specific predictions of the number of customers likely to churn per customer variable, which is key in the bank’s forecasting abilities.

In addition to understanding and predicting churn behaviour, this type of analysis of internal customer and transaction data can be applied to other areas of business – most notably fraud detection and credit risk modelling.

Overcoming process challenges

This analysis took six months to complete, most of which was spent on accessing and cleaning the data. Significant technical challenges presented themselves in accessing the data (due to size limitations of CSV extracts) and in the quality of the data (with variables being missing or inaccurate). In addition to technical difficulties, we experienced challenges in asking the bank’s internal departments to open access to “their” data, essentially changing the way they were used to extracting and granting access to data. Over time, we devised two specific solutions to these challenges. Firstly, we worked on a proof of concept of the analysis with sample data, to build trust with the bank’s departments and to show the value of the analysis. Secondly, we transferred the CSV files into a common SQL server, which significantly reduced the time and effort involved in accessing the data for analysis.

The sentiment challenge

Although the bank has internal data going back years, tracking the activities of its customers and the various products that perform better or worse, this only represents the “what” of the bank’s business performance, not the “why”. For a deeper understanding of why their customers decide to join or leave the bank or why they appreciate certain products and disregard others, the bank has to rely on additional market research. This can consist of engagements by its customer services department, customer surveys or focus group discussions – all of which can be relatively expensive and time-consuming. Moreover, the bank does not have easy access to information about the “why” of its performance compared to its competitors. 

The bank therefore identified the need to leverage publicly available social media data to perform a sentiment analysis. The purpose of this project was threefold:

1)    To understand the sentiment of current customers towards the bank and its products
2)    To understand the sentiment of Rwandan customers towards competing banks
3)    To understand the characteristics that set the bank’s products apart from those of its competitors (either positively or negatively)

Twitter data on market sentiment

To start, the popularity of keywords related to financial services was analysed using Google Trends data. Working through a long list of keywords (e.g. ATM, credit card, mortgage, prepaid and internet banking), we compiled the most popular keywords related to financial services in Rwanda. We focused our subsequent analysis on these keywords only. Extracting from Twitter, filtering to only include tweets from Rwanda in the last three years, we extracted all tweets that mention the prioritised keywords and we saved them in a static CSV table for analysis. The variables extracted in this way included all text, interactions (such as tags, likes, retweets and responses), user demographics, time and date, and location of the tweet. We were able to identify which bank was being spoken about by analysing the tagged Twitter accounts and text in the tweets. By using a Lexicon Cloud service, we segmented the tweets per keyword and per bank according to positive, neutral or negative sentiments. 

Through analysing the balance of sentiments across keywords (product types) and banks, we were able to rapidly answer the following questions:

-    How do Rwandan customers feel about the different products offered by our bank?
-    How do Rwandan customers feel about the different products offered by other banks in Rwanda? 
-    For our products that customers feel negatively about, which competitor banks offer the same product that customers feel most positively about? 

Having these insights readily available allowed us to prioritise certain keywords (product types) to investigate the content of the tweets. By summarising the key themes, it was possible to provide the different product departments with suggestions for where to improve (where customers indicated negative experiences) or what to advertise (where customers indicated positive experiences). Drilling down deeper into the data could even provide branch managers with insights into which locations were more commonly related with bad experiences – either for our bank branches (indicating a required improvement) or for competitor branches (indicating an opportunity to attract dissatisfied customers from elsewhere). Collaborating directly with different departments (such as the marketing department) allowed us to also answer specific strategic questions they had, based on the retrieved Twitter data.

The insights from this exercise were rich and more accessible and affordable than insights from other market research approaches. Working directly with certain departments allowed us to drill down into questions they had, and accessing tweets from the whole of Rwanda gave the bank staff a wider view than they previously had. 

Process recommendations

Some technical challenges limited the outcomes from this project, however. The Lexicon Cloud service utilised for the sentiment analysis only had English functionality, which means that tweets in French and Kinyarwanda were disregarded. We would also recommend that other organisations interested in similar projects investigate applying machine learning to the text of the tweets to allow for more rapid summarising of themes. Finally, organisations interested in using such a project to analyse their position in the market would need to ensure that their social media engagements are sufficiently high – to be certain that they could retrieve enough tweets that have tagged their bank. In the absence of a social media presence of the bank itself, it would still be possible to analyse the sentiment of Rwandans to financial products and banks in general.


Both projects successfully demonstrated the value of using internal data assets, as well as freely accessible public data assets, to uncover significant business value for the bank. We look forward to generating much more value for the bank in future, as we have been permanently appointed as the bank’s first data scientists as of January 2020. 
If you’re interested in what we’ve learned about navigating the bank’s internal data and team structures, have a look at our second blog. If you’re interested in exploring similar projects within your own institution, please reach out to Dumisani Dube at

Over a period of 18 months (July 2018 to December 2019), insight2impact  supported a technical assistance (TA) Lite programme within a large bank in Rwanda. The initiative supported the placement of two data science interns and support by Ixio Analytics , an external data science consultancy, within the bank for a predetermined period. The data science interns were recruited from the African Institute for Mathematical Sciences (AIMS), were directly contracted to the bank and worked under the supervision of bank staff, an expert from Ixio Analytics, and insight2impact. At the end of the programme, in recognition of the value they contributed, the bank created new positions and offered the interns permanent contracts as data scientists.