Who’s Calling? Characterizing Robocalls through Audio and Metadata Analysis

Hello there! This is Bob, and I am calling from "XYZ" auto insurance. Our records indicate that your car warranty is about to expire! Please press 1 to talk to our customer care specialist or press 9 to be added to our do-not-call list.

Have you ever received a similar phone call from an unknown number trying to sell car insurance, health insurance, or threatening to terminate your Social Security Number? Do you wonder who is generating these calls and how do they operate? We have answers!

What is a Robocall?

Robocalls are automated or semi-automated calls that play a recorded message. We present the first long-term analysis of the robocalling landscape by studying phone calls made to more than 66,000 phone numbers over 11 months.

In this blog post, we answer 4 important questions about robocalls:

1. Is the robocalling problem getting worse?

2. By answering robocalls, will you receive more robocalls?

3. Who is calling you?

4. What strategies do robocalls use to entice their victims?

Data Collection: To answer these questions, we collected data about robocalls. Using over 66,000 phone numbers, we operated a large-scale telephony honeypot for over 11 months. We collected call signaling data, CDR data (calling number, called number, timestamp, call duration for answered calls), and recorded call audio for those calls which we answered in our honeypot.

1. Is the robocalling problem getting worse?

Over 11 months, the weekly call volume of unsolicited phone calls neither increased nor decreased. Instead, we observed a stationary trend of unsolicited phone calls after accounting for outliers and our honeypot’s winter downtime. In the below graph plot of normalized call volume in our honeypot, a stationary trend is evident.

“Storms” - High call volume event: In April of 2019, we observed an interesting event. Despite observing a relatively constant call volume for the first few weeks, there was a sudden increase in the number of unsolicited calls. Surprisingly, a handful of phone numbers received thousands of calls from different sources within a few hours. We characterized this phenomenon as storm. The largest storms observed in April consisted of over 1,400 calls made from more than 750 unique callers to a single phone number within 24 hours. We encountered about 650 such storms throughout the 11 months spread across over 220 phone numbers.

A possible explanation for storms is when a robocalling operation spoofs a number we own to generate a large number of phone calls. The victims who received the robocall tried to call back the perpetrator, and in-turn overwhelmed the actual owner of the phone number. Interestingly, a colleague in our lab was a victim of a storm event. He was overwhelmed with calls from hundreds of strangers complaining that they had received a call from him! Needless to say, he was unable to use his phone for a few days until the calls died down.

2. By answering robocalls, will you receive more robocalls?

News reports and regulatory agencies recommend phone users to avoid answering calls from unknown numbers to reduce the number of robocalls. We conducted an experiment where we compared the number of calls received when we were answering phone calls to the number of calls received when we did not answer phone calls. Surprisingly, we found that answering phone calls does not necessarily increase the number of robocalls you would receive. Phone users should be cautious when you get a call from an unknown number. However, occasionally answering an unsolicited phone call does not mean you will receive more robocalls.

3. Who is calling you?

Over 11 months, we collected about 150,000 call recordings. To understand robocalling operations responsible for these robocalls, we developed an audio clustering pipeline. Our key insight here is that a robocalling operation uses similar audio recordings while generating a large number of unsolicited phone calls. The five-stage audio clustering pipeline groups similar call recordings into broader robocalling campaigns.

Fraudulent and illegal robocalling operations: By processing close to 150,000 call recordings using our clustering pipeline, we uncovered more than 2,500 robocalling campaigns. Some of the largest campaigns seen in our honeypot were fraudulent and targeted vulnerable population. The 10th largest campaign was a long-running Social Security Fraud campaign which impersonated federal agencies to defraud their victims.

4. What strategies do robocalls use to entice their victims?

Rampant caller ID spoofing: News reports and consumer forums report that robocalls frequently spoof caller IDs. We found evidence that supports these reports. Robocalls regularly change their caller ID by spoofing their calling number or by rotating between a large pool of phone numbers. The Social Security fraud campaign discussed earlier used a large pool of toll-free numbers to spoof their caller ID. The largest campaign seen in our honeypot used a different calling number for almost every call it generated!

Neighbor Spoofing: We observed some robocalling operations use more sophisticated techniques to spoof caller IDs. They match the first 6 digits of the calling number with the called number, and in-turn uses the same area code as your number when they call you. This gives the impression that the call originated from the local neighborhood, and can influence the victim into answering the robocall.

But wait, there’s more!

We developed techniques to identify and measure other forms of fraud like voicemail injection, wangiri spam and CNAM abuse. In our paper, we discuss the recent STIR/SHAKEN initiative and how our work can help in the fight against robocalls. For more details and other interesting findings, check out our paper! Our work was recently published at USENIX Security 2020: a top-tier peer-reviewed security conference.

Click here to access a PDF version of the paper.

Contact Brad Reaves or Sathvik Prasad for questions!

BibTeX

@inproceedings {whoscalling_usenix_2020,
author = {Sathvik Prasad and Elijah Bouma-Sims and Athishay Kiran Mylappan and Bradley Reaves},
title = {Who{\textquoteright}s Calling? Characterizing Robocalls through Audio and Metadata Analysis},
booktitle = {29th {USENIX} Security Symposium ({USENIX} Security 20)},
year = {2020},
isbn = {978-1-939133-17-5},
pages = {397--414},
url = {https://www.usenix.org/conference/usenixsecurity20/presentation/prasad},
publisher = {{USENIX} Association},
month = aug,}