LEAKYLINKS: Measuring the Security and Privacy Risks of URL Scanning Services

Ali Mustafa, Jannis Rautenstrauch, Florian Hantke, Shubham Agarwal, Stefano Calzavara and Ben Stock
In 47th IEEE Symposium on Security and Privacy, May 2026
Paper Code BibTeX

Abstract

URL scanning services are widely used in security workflows to detect malicious websites and protect users from online threats. However, their common practice of publicly indexing scanned URLs may unintentionally expose sensitive user information through URL-embedded access credentials. Although isolated accounts of such privacy incidents exist, a systematic assessment of their prevalence is still lacking.

We present LEAKYLINKS, an automated analysis pipeline that combines URL filtering with LLM-driven semantic classification to identify URLs exposing Sensitive Personal Information (SPI). Using LEAKYLINKS, we analyze URLs collected from public feeds of six prominent URL scanning services over a period of three weeks. With the framework, we visited 332k URLs, identifying over 4k URLs which leak SPI with a precision of 97%.

To further assess the extent to which published URLs are actively accessed by third parties, we deploy honeypages and submit their links to the selected URL scanning services. Our measurements confirm that external entities access URLs submitted to these scanners, often from potentially suspicious IPs exhibiting behavior commonly associated with reconnaissance or opportunistic probing.

Taken together, these findings indicate that URL scanning services represent a valuable target for web adversaries and may already be subject to active exploitation in the wild.

BibTeX

Download BibTeX or copy below:

@inproceedings{mustafaLeakyLinks2026,
  title = {{LEAKYLINKS: Measuring the Security and Privacy Risks of URL Scanning Services}},
  shorttitle = {{LEAKYLINKS}},
  booktitle = {{{IEEE Symposium}} on {{Security}} and {{Privacy}}},
  author = {Mustafa, Ali and Rautenstrauch, Jannis and Hantke, Florian and Agarwal, Shubham and Calzavara, Stefano and Stock, Ben},
  date = {2026},
  publisher = {IEEE Computer Society},
  doi = {TBA},
  abstract = {URL scanning services are widely used in security workflows to detect malicious websites and protect users from online threats. However, their common practice of publicly indexing scanned URLs may unintentionally expose sensitive user information through URL-embedded access credentials. Although isolated accounts of such privacy incidents exist, a systematic assessment of their prevalence is still lacking. We present LEAKYLINKS, an automated analysis pipeline that combines URL filtering with LLM-driven semantic classification to identify URLs exposing Sensitive Personal Information (SPI). Using LEAKYLINKS, we analyze URLs collected from public feeds of six prominent URL scanning services over a period of three weeks. With the framework, we visited 332k URLs, identifying over 4k URLs which leak SPI with a precision of 97\%. To further assess the extent to which published URLs are actively accessed by third parties, we deploy honeypages and submit their links to the selected URL scanning services. Our measurements confirm that external entities access URLs submitted to these scanners, often from potentially suspicious IPs exhibiting behavior commonly associated with reconnaissance or opportunistic probing. Taken together, these findings indicate that URL scanning services represent a valuable target for web adversaries and may already be subject to active exploitation in the wild.},
  isbn = {TBA},
  langid = {english}
}