Metrics-Lib
  • DescripTor
  • Verifying releases
  • Source code example
  • Changelog
  • Contributor's guide
  • JavaDocs
  • Tutorials
  • Links
Metrics-Lib

A Java library to fetch
and parse Tor descriptors.



DescripTor - A Tor Descriptor API for Java

DescripTor is a Java API that fetches Tor descriptors from a variety of sources like cached descriptors and directory authorities/mirrors. The DescripTor API is useful to support statistical analysis of the Tor network data and for building services and applications.

Learn more Download


Table of contents

DescripTor Verifying releases Source code example Changelog Contributor's guide JavaDocs Tutorials Links

DescripTor - A Tor Descriptor API for Java #

DescripTor is a Java API that fetches Tor descriptors from a variety of sources like cached descriptors and directory authorities/mirrors. The DescripTor API is useful to support statistical analysis of the Tor network data and for building services and applications.

The descriptor types supported by DescripTor include relay and bridge descriptors which are part of Tor's directory protocol as well as Torperf data files and TorDNSEL's exit lists. Access to these descriptors is unified to facilitate access to publicly available data about the Tor network.

This API is designed for Java programs that process Tor descriptors in batches. A Java program using this API first sets up a descriptor source by defining where to find descriptors and which descriptors it considers relevant. The descriptor source then makes the descriptors available in a descriptor store. The program can then query the descriptor store for the contained descriptors. Changes to the descriptor sources after descriptors are made available in the descriptor store will not be noticed. This simple programming model was designed for periodically running, batch-processing applications and not for continuously running applications that rely on learning about changes to an underlying descriptor source.

The executable jar, source jar, and javadoc jar can be found here [LINK]. Before using them please verify the release (see below for instructions).


Verifying releases #

Releases can be cryptographically verified to get some more confidence that they were put together by a Tor developer. The following steps explain the verification process by example.

Download the release tarball and the separate signature file:

wget https://dist.torproject.org/descriptor/1.0.0/descriptor-1.0.0.tar.gz
wget https://dist.torproject.org/descriptor/1.0.0/descriptor-1.0.0.tar.gz.asc

Attempt to verify the signature on the tarball:

gpg --verify descriptor-1.0.0.tar.gz.asc

If the signature cannot be verified due to the public key of the signer not being locally available, download that public key from one of the key servers and retry:

gpg --keyserver pgp.mit.edu --recv-key 0x4EFD4FDC3F46D41E
gpg --verify descriptor-1.0.0.tar.gz.asc

If the signature still cannot be verified, something is wrong!

But note that even if it can be verified, you now only know that the signature was made by the person claiming to own this key, which could be anyone. You'll need a trust path to the owner of this key in order to trust this signature, but that's clearly out of scope here. In short, your best chance is to meet a Tor developer in real life and enter the web of trust.

If you want to go one step further in the verification game, you can verify the signature on the .jar files.

Print and then import the provided X.509 certificate:

keytool -printcert -file CERT
keytool -importcert -alias karsten -file CERT

Verify the signatures on the contained .jar files using Java's jarsigner tool:

jarsigner -verify descriptor-1.0.0.jar
jarsigner -verify descriptor-1.0.0-sources.jar

Source Code Example #

import org.torproject.descriptor.*;

import java.io.File;

public class DownloadConsensuses {
  public static void main(String[] args) {

    // Download consensuses published in the last 72 hours, which will
take up to five minutes and require several hundred MB on the local disk.
    DescriptorCollector descriptorCollector =
DescriptorSourceFactory.createDescriptorCollector();
    descriptorCollector.collectDescriptors(
        // Download from Tor's main CollecTor instance,
        "https://collector.torproject.org",
        // include only network status consensuses
        new String[] { "/recent/relay-descriptors/consensuses/" },
        // regardless of last-modified time,
        0L,
        // write to the local directory called descriptors/,
        new File("descriptors"),
        // and don't delete extraneous files that do not exist remotely anymore.
        // and here's a very long line because why not let's write more stuff *type* *type* *type* am I done yet? oh i'll just copy this line... and here's a very long line because why not let's write more stuff *type* *type* *type* am I done yet? oh i'll just copy this line... done!
        false);
  }
}
<style type="text/css">
  pre.highlight {
    background: #F0F0F0;
  }
  pre.highlight code {
    display: block;
    overflow-x: auto;
    padding: 0.5em;
    background: #F0F0F0;
    white-space: pre;
  }
  .hljs {
    display: inline-block;
    overflow-x: scroll;
    padding: 0.5em;
    padding-right: 100%;
    background: #002b36;
    color: #839496;
    -webkit-text-size-adjust: none;
  }
</style>


Changelog #

# Changes in version 1.6.0 - 2017-02-17

 * Major changes
   - Deprecate DescriptorDownloader in favor of the much more widely
     used DescriptorCollector.

 * Medium changes
   - Add two methods for loading and saving a parse history file in
     the descriptor reader to avoid situations where applications fail
     after all descriptors are read but before they are all processed.
   - Unify the build process by adding git-submodule metrics-base in
     src/build and removing all centralized parts of the build
     process.
   - Avoid deleting extraneous local descriptor files when collecting
     descriptors from CollecTor.
   - Turn the descriptor reader thread into a daemon thread, so that
     the application can decide at any time to stop consuming
     descriptors without having to worry about the reader thread not
     being done.
   - Parse "proto" lines in server descriptors, "pr" lines in status
     entries, and "(recommended|required)-(client|relay)-protocols"
     lines in consensuses and votes.
   - Parse "shared-rand-.*" lines in consensuses and votes.
   - Deprecate DescriptorCollectorImpl now that
     DescriptorIndexCollector is the default.


# Changes in version 1.5.0 - 2016-10-19

 * Major changes
   - Make the DescriptorCollector implementation that uses CollecTor's
     index.json file to determine which descriptor files to fetch the
     new default.  Applications must provide gson-2.2.4.jar or higher
     as dependency.

 * Minor changes
   - Avoid running into an IOException and logging a warning for it.


# Changes in version 1.4.0 - 2016-08-31

 * Major changes
   - Add the Simple Logging Facade for Java (slf4j) for logging
     support rather than printing warnings to stderr.  Applications
     must provide slf4j-api-1.7.7.jar or higher as dependency and can
     optionally provide a compatible logging framework of their choice
     (java.util.logging, logback, log4j).

 * Medium changes
   - Add an alpha version of a DescriptorCollector implementation that
     is not enabled by default and that uses CollecTor's index.json
     file to determine which descriptor files to fetch.  Applications
     can enable this implementation by providing gson-2.2.4.jar or
     higher as dependency and setting property descriptor.collector to
     org.torproject.descriptor.index.DescriptorIndexCollector.

 * Minor changes
   - Include resource files in src/*/resources/ in the release
     tarball.
   - Move executable, source, and javadoc jar to generated/dist/.


# Changes in version 1.3.1 - 2016-08-01

 * Medium changes
   - Adapt to CollecTor's new date format to make DescriptorCollector
     work again.


# Changes in version 1.3.0 - 2016-07-06

 * Medium changes
   - Parse "package" lines in consensuses and votes.
   - Support more than one "directory-signature" line in a vote, which
     may become relevant when authorities start signing votes using
     more than one algorithm.
   - Provide directory signatures in consensuses and votes in a list
     rather than a map to support multiple signatures made using the
     same identity key digest but different algorithms.
   - Be more lenient about digest lengths in directory signatures
     which may be longer or shorter than 20 bytes.
   - Parse "tunnelled-dir-server" lines in server descriptors.

 * Minor changes
   - Stop reporting "-----END .*-----" lines in v2 network statuses as
     unrecognized.


# Changes in version 1.2.0 - 2016-05-31

 * Medium changes
   - Include the hostname in directory source entries of consensuses
     and votes.
   - Also accept \r\n as newline in Torperf results files.
   - Make unrecognized keys of Torperf results available together with
     the corresponding values, rather than just the whole line.
   - In Torperf results, recognize all percentiles of expected bytes
     read for 0 <= x <= 100 rather than just x = { 10, 20, ..., 90 }.
   - Rename properties for overriding default descriptor source
     implementation classes.
   - Actually return the signing key digest in network status votes.
   - Parse crypto parts in network status votes.
   - Document all public parts in org.torproject.descriptor and add
     an Ant target to generate Javadocs.

 * Minor changes
   - Include a Torperf results line with more than one unrecognized
     key only once in the unrecognized lines.
   - Make "consensus-methods" line optional in network statuses votes,
     which would mean that only method 1 is supported.
   - Stop reporting "-----END .*-----" lines in directory key
     certificates as unrecognized.
   - Add code used for benchmarking.


# Changes in version 1.1.0 - 2015-12-28

 * Medium changes
   - Parse flag thresholds in bridge network statuses, and parse the
     "ignoring-advertised-bws" flag threshold in relay network status
     votes.
   - Support parsing of .xz-compressed tarballs using Apache Commons
     Compress and XZ for Java.  Applications only need to add XZ for
     Java as dependency if they want to parse .xz-compressed tarballs.
   - Introduce a new ExitList.Entry type for exit list entries instead
     of the ExitListEntry type which is now deprecated.  The main
     difference between the two is that ExitList.Entry can hold more
     than one exit address and scan time which were previously parsed
     as multiple ExitListEntry instances.
   - Introduce four new types to distinguish between relay and bridge
     descriptors: RelayServerDescriptor, RelayExtraInfoDescriptor,
     BridgeServerDescriptor, and BridgeExtraInfoDescriptor.  The
     existing types, ServerDescriptor and ExtraInfoDescriptor, are
     still usable and will not be deprecated, because applications may
     not care whether a relay or a bridge published a descriptor.
   - Support Ed25519 certificates, Ed25519 master keys, SHA-256
     digests, and Ed25519 signatures thereof in server descriptors and
     extra-info descriptors, and support Ed25519 master keys in votes.
   - Include RSA-1024 signatures of SHA-1 digests of extra-info
     descriptors, which were parsed and discarded before.
   - Support hidden-service statistics in extra-info descriptors.
   - Support onion-key and ntor-onion-key cross certificates in server
     descriptors.

 * Minor changes
   - Start using Java 7 features like the diamond operator and switch
     on String, and use StringBuilder correctly in many places.


# Changes in version 1.0.0 - 2015-12-05

 * Major changes
   - This is the initial release after four years of development.
     Happy 4th birthday!


A contributor's guide to how we develop metrics-lib #

Dear contributor to metrics-lib, this text is an attempt to tell you how we do development for this fine library. We highly encourage you to read it when making contributions to metrics-lib to make it easier for us to accept them. But we also invite you to question these guidelines and make suggestions if you see room for improvement.

Purpose

Before we go into the details of writing code, let's briefly talk about the purpose of metrics-lib. Back in 2011, the reason for creating this library was to avoid rewriting the same code over and over that would handle data gathered in the Tor network. metrics-lib is now being used in the major Java-based tools in the Tor metrics space, and it's being used by researchers to do one-off analyses of Tor network data.

Design overview

metrics-lib is not that big, so it shouldn't be difficult to go through the interfaces and classes to see what they are doing. But to give you a general overview, here are some highlights:

  • We tried to separate interfaces from implementation classes as much as possible and put them into different packages. As a rule of thumb, applications using metrics-lib should never need to import one of the implementation classes.
  • There are two types of classes: descriptor classes and classes that provide descriptor instances. For the first type there's a class for each method to obtain descriptors, and for the second type there's a class for pretty much each kind of descriptor that is available in the Tor network.

Dependencies

We tried to keep the number of dependencies as small as possible, and we tried to avoid adding any dependencies that wouldn't be available in common operating system distributions like Debian stable. That doesn't mean that we're opposed to add any further dependencies, but we need to keep in mind that any user of our library will have to add those dependencies, too.

metrics-lib currently has the following dependencies to compile:

  • Apache Commons Compression 1.9 (go to the project page at https://commons.apache.org/proper/commons-compress/ and select Download, Archives, Binaries, and then the tarball or zip file for version 1.9.)
  • XZ for Java 1.5 (the project page at http://tukaani.org/xz/java.html contains the most recent version, but older versions need to be retrieved from http://mvnrepository.com/artifact/org.tukaani/xz.)
  • JUnit 4.11 and Hamcrest 1.3 (go to the JUnit project page at http://junit.org/ and select Download and install, junit.jar, version 4.11, and hamcrest-core.jar, version 1.3.)

Code style

We're using a code style that is not really formally defined but that roughly follows these rules:

  • We avoid tabs and favor 2 spaces where other people would use a tab.
  • We break lines after at most 74 characters and indent new lines with 4 spaces.
  • Every public interface or method should have a JavaDoc comment, which should be a full sentence. We failed to do this in large parts of the current code where we used comments instead of JavaDoc comments, but we should fix that at some point.

There's probably more to say about code style, but please take a look at the existing code and try to write new code as similar as possible.

Tests

metrics-lib is still rather light on unit tests, but that shouldn't prevent us from writing tests for new code. Test classes go into a separate source directory and use the same package structure as the class they're supposed to test.

Deprecating features

We have to assume that applications don't update their metrics-lib version very often. This is related to the lack of a release process until recently. If we want to remove a feature we'll have to deprecate it and basically keep it working for at least another year.

Change log

We're keeping a change log since we started putting out releases. Here we're going to describe what deserves a change log entry and whether those changes are major, medium, or minor:

  • Bug fixes obviously need a change log entry, but it depends on the bug whether it should be listed as major or medium change.
  • Enhancements that extend the API are also worth noting in the change log, though their importance would most likely be medium.
  • All enhancements must be backwards-compatible, so whenever we want to switch to a different interface we'll have to deprecate the existing interface and at the same time provide a new one that applications should use instead. Deprecating a feature would be a medium change that should be mentioned in the change log.
  • Enhancements that make the implementation more efficient or that refactor some internal code might also be worth noting in the change log, but very likely as medium enhancements. An exception would be ground-breaking performance improvements that most application developers would care about, which would be major enhancements.
  • Whenever we add a new dependency, that's clearly a major change that needs to be written into the change log, because applications will have to add this dependency, too.
  • Removing an existing dependency is also worth mentioning in the change log, though that's rather a medium change that doesn't force applications to act that quickly.
  • Any simple code cleanups, new tests, changes to documentation like this file, etc. only require a summary change log entry and will lead to a minor version change.

Releases

As a rule of thumb, we should put out a new release of metrics-lib soon after making a major change as listed under "Change log" above. If we're planning to make more changes soon after, let's wait for them and make a release with everything. But we shouldn't let a major change sit in an unreleased metrics-lib for more than, say, two weeks. In contrast to that, medium changes can stay unreleased for longer, though they don't have to if we want to use them in an application sooner. Minor changes can be collected and usually will be released with changes on higher stages, but when necessary they can be released earlier.

Regarding version numbers, we started with 1.0.0 and bumped to 1.1.0, 1.2.0, etc. which were all backwards-compatible changes. Whenever we'll remove a previously deprecated feature, making a backwards-incompatible change, we'll bump to 2.0.0. For minor changes, we'd bump to x.x.1.

Releases are cryptographically signed on multiple levels: the Git tag created for the release is signed using GnuPG, the produced .jar files are signed using Java's jarsigner tool, and the produced tarball is again signed using GnuPG. We'll assume that you're familiar with GnuPG and how to manage keys for it, but we're including some sample commands for managing keys for jarsigner using the less commonly known Java keytool below.

First, generate a new key pair and create a certificate that expires after 90 days using reasonable (yet compatible) cryptographic algorithms and parameters:

keytool -genkeypair -alias karsten -keyalg RSA -keysize 2048 \
-sigalg SHA256withRSA -validity 90 \
-dname "CN=Karsten Loesing, O=The Tor Project\, Inc, L=Seattle, ST=WA, C=US"

Extend the certificate for the existing key pair when it expires:

keytool -selfcert -alias karsten

Export the certificate to a local file called CERT:

keytool -exportcert -alias karsten -rfc -file CERT

Also, in order to sign releases using Ant, you'll have to create a file `build.properties` with content similar to the content below (note that without such a file, the Ant targets starting at `signjar` won't work):

jarsigner.alias=karsten jarsigner.storepass=password

Putting out a new release requires a series of steps:

Edit `build.xml` and raise the `release.version` property to the desired new release.

Edit `CHANGELOG.md` and make sure it contains the correct date for the release.

Commit these changes.

Clean up the src/ and lib/ directories from any files that may have been used locally for developing and that shouldn't be included in the tarball.

Use Ant to first clean up and then create a tarball containing all sources and signed .jar files:

ant clean compile test jar signjar tar

Sign the produced tarball using GnuPG:

gpg --detach-sign --armor --local-user 0x4EFD4FDC3F46D41E \ descriptor-1.0.0.tar.gz

Verify the signed tarball, ideally on a different system, as described in `README.md`.

Create a signed Git tag for the new release:

git tag -s descriptor-1.0.0 -m "DescripTor 1.0.0"

Push the branch. Ideally, verify the tag signature by cloning it on another system and running the following command:

git verify-tag descriptor-1.0.0

Upload the tarball and signature file and announce the new version.

Edit `build.xml` again and raise `release.version` to the current release plus `-dev`, e.g., `1.0.0-dev`.

Development

If you want to start working on metrics-lib, you can clone the repo:

git clone --recursive https://git.torproject.org/metrics-lib.git

In case you forgot to add '--recursive' just run the bootstrap script for the submodule:

./src/main/resources/bootstrap-development.sh

or the contained git command:

git submodule update --init --remote

Packages

There are no metrics-lib packages yet, but we should aim for providing packages for at least Debian stable, either official or unofficial.

Closing words

Dear contributor, now that you made it to the end of this guide, please be reminded that these are just guidelines that shall make it easier for us to work on metrics-lib. But we're making these rules ourselves, and that "we" includes you. Please suggest any changes to this guide and help us make it better. Thanks!


JavaDocs #

Content
Content
Content


Tutorials #

Content
Content
Content


Links #

Link to GIT with description, description and description
link to CollecTor
link to releases or release table
link to sources
link to bug tracker, open bugs and direct link for adding new bug
link to contributor docs on team wiki page


© 2017 The Tor Project

Contact

Data on this site is freely available under a CC0 no copyright declaration: To the extent possible under law, the Tor Project has waived all copyright and related or neighboring rights in the data. "Tor" and the "Onion Logo" are registered trademarks of The Tor Project, Inc.