View on GitHub

tmodel-ccs2018

Privacy-Preserving Dynamic Learning of Tor Network Traffic

Overview

This is the landing page for the following research publication:

Privacy-Preserving Dynamic Learning of Tor Network Traffic
Proceedings of the 25th ACM Conference on Computer and Communication Security (CCS 2018)
by Rob Jansen, Matthew Traudt, and Nicholas Hopper [Full paper available here]

If you reference this paper or use any of the data or models provided on this page, please cite the paper. Here is a bibtex entry for latex users:

@inproceedings{tmodel-ccs2018,
author = {Rob Jansen and Matthew Traudt and Nicholas Hopper},
title = {Privacy-Preserving Dynamic Learning of {Tor} Network Traffic},
booktitle = {25th ACM Conference on Computer and Communications Security (CCS)},
year = {2018},
note = {See also \url{https://tmodel-ccs2018.github.io}},
}

The research included privacy-preserving measurement and Tor network simulation components.

Measurement

Measurement of Tor was done using PrivCount, a tool for privacy-preserving Tor statistics aggregation, along with a modified version of Tor. We modified each of these tools for the purposes of dynamically learning and modeling Tor traffic:

PrivCount Code

Traffic learning and modeling changes have been merged upstream!

git repo: git@github.com:privcount/privcount.git
at branch master
(since commit c707af2a3f3e4ae7aa16b672f4d83ae1806f597d)
git web: https://github.com/privcount/privcount

The version we used for our experiments:

git repo: git@github.com:robgjansen/privcount.git
using branch research/tmodel/train-v3
git web: https://github.com/robgjansen/privcount/tree/research/tmodel/train-v3

Tor Code

Traffic learning and modeling changes have been merged upstream!

git repo: git@github.com:privcount/tor.git
at branch privcount-master
(since commit 38d6e2dafbc0669b38d2564426b21e67d83fea3f)
git web: https://github.com/privcount/tor

The version we used for our experiments:

git repo: git@github.com:robgjansen/tor.git
using branch research/tmodel/train-v3-03210
git web: https://github.com/robgjansen/tor/tree/research/tmodel/train-v3-03210

Data

See data/privcount in the repo. Each measurement number corresponds to the measurement number listed in Table 2 in the paper. Measurements 1-7 are ground truth measurements, measurement 8 includes 14 iterations for learning the packet model, and measurement 9 includes 14 iterations for learning the stream model.

The best packet model was from measurement 8-9, and the best stream model was from measurement 9-9.

Simulation

Simulation was done using Shadow, a full network simulation tool that directly executes Tor.

Shadow Code

Changes have been merged upstream!

Shadow:

git repo: git@github.com:shadow/shadow.git
git web: https://github.com/shadow/shadow
our experiments were run at commit: 443fdc234b879080529390d2306209742f5b3434

Shadow-Plugin-Tor:

git repo: git@github.com:shadow/shadow-plugin-tor.git
git web: https://github.com/shadow/shadow-plugin-tor
our experiments were run at commit: b5024bb366800198e09359dcbb768638d37c2aa7

You can run Shadow with your own version of Tor to help with your own research.

If you want to export PrivCount events in order to count the number of streams, circuits, bytes, etc. as we did in the paper, you’ll need to use the Tor research/tmodel/train-v3-03210 branch listed above. Additionally, due to a bug in PrivCount, you should also apply our workaround patch to TGen to make sure all stream events get recorded correctly: data/shadow/workaround_for_privcount_stream_bug.patch

Shadow Network Configuration

Section 6.1.1 in the paper describes our approach to creating an Internet model for using as Shadow’s network configuration. That methodology yielded a network graph graphml file that we used in our Shadow simulations. We also back-ported the network graph for a previous stable version of Shadow. These files should be decompressed and copied to ~/.shadow/share.

Network graph file for use with Shadow v1.13.0 (the version we used in our experiments)
Network graph file for use with Shadow v1.12.1 (older stable version of Shadow)

Later research has used a version of our Internet model that does not contain packet loss on the links between the core routers in the topology. We host these versions of our model for posterity.

Network graph file for use with Shadow v1.13.0 (lossless links between core routers)
Network graph file for use with Shadow v2.x.x (more recent versions of Shadow)

We’ve compiled the scripts necessary to gather the input data and compile it into a topology file here. See here for links to the input data we used.

Shadow Host Configuration

Our Shadow experiments used the client behavior models that we discuss in the paper. You can incorporate these models into your own Shadow experiments.

Protocol TGen models for BitTorrent clients and HTTP clients
PrivCount TGen HMM models for stream and packet generation

If you want to repeat our experiments, Section 6.1.2 in the paper describes our host configuration for each of the 3 TGen models we tested. Here are the Shadow configurations needed to run each experiment. Run time is estimated and assumes a 32-core server with 30 Shadow worker threads.

Client Model	# Relays	# Clients	RAM	Run time
Single file	2,000	60,000	~1.25 TiB	~1 week
Protocol	2,000	13,730	~300 GiB	~1 week
PrivCount	2,000	129,419	~2.75 TiB	~1 month