View on GitHub

tmodel-ccs2018

Privacy-Preserving Dynamic Learning of Tor Network Traffic

Overview

This is the landing page for the following research publication:

Privacy-Preserving Dynamic Learning of Tor Network Traffic
Proceedings of the 25th ACM Conference on Computer and Communication Security (CCS 2018)
by Rob Jansen, Matthew Traudt, and Nicholas Hopper [Full paper available here]

If you reference this paper or use any of the data or models provided on this page, please cite the paper. Here is a bibtex entry for latex users:

@inproceedings{tmodel-ccs2018,
author = {Rob Jansen and Matthew Traudt and Nicholas Hopper},
title = {Privacy-Preserving Dynamic Learning of {Tor} Network Traffic},
booktitle = {25th ACM Conference on Computer and Communications Security (CCS)},
year = {2018},
note = {See also \url{https://tmodel-ccs2018.github.io}},
}

The research included privacy-preserving measurement and Tor network simulation components.

Measurement

Measurement of Tor was done using PrivCount, a tool for privacy-preserving Tor statistics aggregation, along with a modified version of Tor. We modified each of these tools for the purposes of dynamically learning and modeling Tor traffic:

PrivCount Code

Traffic learning and modeling changes have been merged upstream!

The version we used for our experiments:

Tor Code

Traffic learning and modeling changes have been merged upstream!

The version we used for our experiments:

Data

See data/privcount in the repo. Each measurement number corresponds to the measurement number listed in Table 2 in the paper. Measurements 1-7 are ground truth measurements, measurement 8 includes 14 iterations for learning the packet model, and measurement 9 includes 14 iterations for learning the stream model.

The best packet model was from measurement 8-9, and the best stream model was from measurement 9-9.

Simulation

Simulation was done using Shadow, a full network simulation tool that directly executes Tor.

Shadow Code

Changes have been merged upstream!

Shadow:

Shadow-Plugin-Tor:

You can run Shadow with your own version of Tor to help with your own research.

If you want to export PrivCount events in order to count the number of streams, circuits, bytes, etc. as we did in the paper, you’ll need to use the Tor research/tmodel/train-v3-03210 branch listed above. Additionally, due to a bug in PrivCount, you should also apply our workaround patch to TGen to make sure all stream events get recorded correctly: data/shadow/workaround_for_privcount_stream_bug.patch

Shadow Network Configuration

Section 6.1.1 in the paper describes our approach to creating an Internet model for using as Shadow’s network configuration. That methodology yielded a network graph graphml file that we used in our Shadow simulations. We also back-ported the network graph for a previous stable version of Shadow. These files should be decompressed and copied to ~/.shadow/share.

Later research has used a version of our Internet model that does not contain packet loss on the links between the core routers in the topology. We host these versions of our model for posterity.

We’ve compiled the scripts necessary to gather the input data and compile it into a topology file here. See here for links to the input data we used.

Shadow Host Configuration

Our Shadow experiments used the client behavior models that we discuss in the paper. You can incorporate these models into your own Shadow experiments.

If you want to repeat our experiments, Section 6.1.2 in the paper describes our host configuration for each of the 3 TGen models we tested. Here are the Shadow configurations needed to run each experiment. Run time is estimated and assumes a 32-core server with 30 Shadow worker threads.

Client Model # Relays # Clients RAM Run time
Single file 2,000 60,000 ~1.25 TiB ~1 week
Protocol 2,000 13,730 ~300 GiB ~1 week
PrivCount 2,000 129,419 ~2.75 TiB ~1 month