Overview
This is the landing page for the following research publication:
Privacy-Preserving Dynamic Learning of Tor Network Traffic
Proceedings of the 25th ACM Conference on Computer and Communication Security (CCS 2018)
by Rob Jansen, Matthew Traudt, and Nicholas Hopper
[Full paper available here]
If you reference this paper or use any of the data or models provided on this page, please cite the paper. Here is a bibtex entry for latex users:
@inproceedings{tmodel-ccs2018,
author = {Rob Jansen and Matthew Traudt and Nicholas Hopper},
title = {Privacy-Preserving Dynamic Learning of {Tor} Network Traffic},
booktitle = {25th ACM Conference on Computer and Communications Security (CCS)},
year = {2018},
note = {See also \url{https://tmodel-ccs2018.github.io}},
}
The research included privacy-preserving measurement and Tor network simulation components.
Measurement
Measurement of Tor was done using PrivCount, a tool for privacy-preserving Tor statistics aggregation, along with a modified version of Tor. We modified each of these tools for the purposes of dynamically learning and modeling Tor traffic:
PrivCount Code
Traffic learning and modeling changes have been merged upstream!
- git repo:
git@github.com:privcount/privcount.git
- at branch
master
(since commitc707af2a3f3e4ae7aa16b672f4d83ae1806f597d
) - git web: https://github.com/privcount/privcount
The version we used for our experiments:
- git repo:
git@github.com:robgjansen/privcount.git
- using branch
research/tmodel/train-v3
- git web: https://github.com/robgjansen/privcount/tree/research/tmodel/train-v3
Tor Code
Traffic learning and modeling changes have been merged upstream!
- git repo:
git@github.com:privcount/tor.git
- at branch
privcount-master
(since commit38d6e2dafbc0669b38d2564426b21e67d83fea3f
) - git web: https://github.com/privcount/tor
The version we used for our experiments:
- git repo:
git@github.com:robgjansen/tor.git
- using branch
research/tmodel/train-v3-03210
- git web: https://github.com/robgjansen/tor/tree/research/tmodel/train-v3-03210
Data
See data/privcount in the repo. Each measurement number corresponds to the measurement number listed in Table 2 in the paper. Measurements 1-7 are ground truth measurements, measurement 8 includes 14 iterations for learning the packet model, and measurement 9 includes 14 iterations for learning the stream model.
The best packet model was from measurement 8-9, and the best stream model was from measurement 9-9.
Simulation
Simulation was done using Shadow, a full network simulation tool that directly executes Tor.
Shadow Code
Changes have been merged upstream!
Shadow:
- git repo:
git@github.com:shadow/shadow.git
- git web: https://github.com/shadow/shadow
- our experiments were run at commit:
443fdc234b879080529390d2306209742f5b3434
Shadow-Plugin-Tor:
- git repo:
git@github.com:shadow/shadow-plugin-tor.git
- git web: https://github.com/shadow/shadow-plugin-tor
- our experiments were run at commit:
b5024bb366800198e09359dcbb768638d37c2aa7
You can run Shadow with your own version of Tor to help with your own research.
If you want to export PrivCount events in order to count the number of streams, circuits, bytes, etc. as we did in the paper, you’ll need to use the Tor research/tmodel/train-v3-03210
branch listed above. Additionally, due to a bug in PrivCount, you should also apply our workaround patch to TGen to make sure all stream events get recorded correctly:
data/shadow/workaround_for_privcount_stream_bug.patch
Shadow Network Configuration
Section 6.1.1 in the paper describes our approach to creating an Internet model for using as Shadow’s network configuration. That methodology yielded a network graph graphml file that we used in our Shadow simulations. We also back-ported the network graph for a previous stable version of Shadow. These files should be decompressed and copied to ~/.shadow/share
.
- Network graph file for use with Shadow v1.13.0 (the version we used in our experiments)
- Network graph file for use with Shadow v1.12.1 (older stable version of Shadow)
Later research has used a version of our Internet model that does not contain packet loss on the links between the core routers in the topology. We host these versions of our model for posterity.
- Network graph file for use with Shadow v1.13.0 (lossless links between core routers)
- Network graph file for use with Shadow v2.x.x (more recent versions of Shadow)
We’ve compiled the scripts necessary to gather the input data and compile it into a topology file here. See here for links to the input data we used.
Shadow Host Configuration
Our Shadow experiments used the client behavior models that we discuss in the paper. You can incorporate these models into your own Shadow experiments.
- Protocol TGen models for BitTorrent clients and HTTP clients
- PrivCount TGen HMM models for stream and packet generation
If you want to repeat our experiments, Section 6.1.2 in the paper describes our host configuration for each of the 3 TGen models we tested. Here are the Shadow configurations needed to run each experiment. Run time is estimated and assumes a 32-core server with 30 Shadow worker threads.
Client Model | # Relays | # Clients | RAM | Run time |
---|---|---|---|---|
Single file | 2,000 | 60,000 | ~1.25 TiB | ~1 week |
Protocol | 2,000 | 13,730 | ~300 GiB | ~1 week |
PrivCount | 2,000 | 129,419 | ~2.75 TiB | ~1 month |