CollecTor¶

Descriptor archives are available from CollecTor. If you need Tor's topology at a prior point in time this is the place to go!

With CollecTor you can either read descriptors directly...

import datetime
import stem.descriptor.collector

yesterday = datetime.datetime.utcnow() - datetime.timedelta(days = 1)

# provide yesterday's exits

exits = {}

for desc in stem.descriptor.collector.get_server_descriptors(start = yesterday):
  if desc.exit_policy.is_exiting_allowed():
    exits[desc.fingerprint] = desc

print('%i relays published an exiting policy today...\n' % len(exits))

for fingerprint, desc in exits.items():
  print('  %s (%s)' % (desc.nickname, fingerprint))

... or download the descriptors to disk and read them later.

import datetime
import stem.descriptor
import stem.descriptor.collector

yesterday = datetime.datetime.utcnow() - datetime.timedelta(days = 1)
cache_dir = '~/descriptor_cache/server_desc_today'

collector = stem.descriptor.collector.CollecTor()

for f in collector.files('server-descriptor', start = yesterday):
  f.download(cache_dir)

# then later...

for f in collector.files('server-descriptor', start = yesterday):
  for desc in f.read(cache_dir):
    if desc.exit_policy.is_exiting_allowed():
      print('  %s (%s)' % (desc.nickname, desc.fingerprint))

get_instance - Provides a singleton CollecTor used for...
  |- get_server_descriptors - published server descriptors
  |- get_extrainfo_descriptors - published extrainfo descriptors
  |- get_microdescriptors - published microdescriptors
  |- get_consensus - published router status entries
  |
  |- get_key_certificates - authority key certificates
  |- get_bandwidth_files - bandwidth authority heuristics
  +- get_exit_lists - TorDNSEL exit list

File - Individual file residing within CollecTor
  |- read - provides descriptors from this file
  +- download - download this file to disk

CollecTor - Downloader for descriptors from CollecTor
  |- get_server_descriptors - published server descriptors
  |- get_extrainfo_descriptors - published extrainfo descriptors
  |- get_microdescriptors - published microdescriptors
  |- get_consensus - published router status entries
  |
  |- get_key_certificates - authority key certificates
  |- get_bandwidth_files - bandwidth authority heuristics
  |- get_exit_lists - TorDNSEL exit list
  |
  |- index - metadata for content available from CollecTor
  +- files - files available from CollecTor

New in version 1.8.0.

stem.descriptor.collector.get_instance()[source]¶

Provides the singleton CollecTor used for this module's shorthand functions.

Returns:	singleton `CollecTor` instance

stem.descriptor.collector.get_server_descriptors(start=None, end=None, cache_to=None, bridge=False, timeout=None, retries=3)[source]¶: Shorthand for get_server_descriptors() on our singleton instance.

stem.descriptor.collector.get_extrainfo_descriptors(start=None, end=None, cache_to=None, bridge=False, timeout=None, retries=3)[source]¶: Shorthand for get_extrainfo_descriptors() on our singleton instance.

stem.descriptor.collector.get_microdescriptors(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶: Shorthand for get_microdescriptors() on our singleton instance.

stem.descriptor.collector.get_consensus(start=None, end=None, cache_to=None, document_handler='ENTRIES', version=3, microdescriptor=False, bridge=False, timeout=None, retries=3)[source]¶: Shorthand for get_consensus() on our singleton instance.

stem.descriptor.collector.get_key_certificates(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶: Shorthand for get_key_certificates() on our singleton instance.

stem.descriptor.collector.get_bandwidth_files(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶: Shorthand for get_bandwidth_files() on our singleton instance.

stem.descriptor.collector.get_exit_lists(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶: Shorthand for get_exit_lists() on our singleton instance.

class stem.descriptor.collector.File(path, types, size, sha256, first_published, last_published, last_modified)[source]¶

Bases: object

File within CollecTor.

Variables:

Variables:	path (str) -- file path within collector types (tuple) -- descriptor types contained within this file compression (stem.descriptor.Compression) -- file compression, None if this cannot be determined size (int) -- size of the file sha256 (str) -- file's sha256 checksum start (datetime) -- first publication within the file, None if this cannot be determined end (datetime) -- last publication within the file, None if this cannot be determined last_modified (datetime) -- when the file was last modified

path (str) -- file path within collector
types (tuple) -- descriptor types contained within this file
compression (stem.descriptor.Compression) -- file compression, None if this cannot be determined
size (int) -- size of the file
sha256 (str) -- file's sha256 checksum
start (datetime) -- first publication within the file, None if this cannot be determined
end (datetime) -- last publication within the file, None if this cannot be determined
last_modified (datetime) -- when the file was last modified

read(directory=None, descriptor_type=None, start=None, end=None, document_handler='ENTRIES', timeout=None, retries=3)[source]¶

Provides descriptors from this archive. Descriptors are downloaded or read from disk as follows...

If this file has already been downloaded through :func:`~stem.descriptor.collector.CollecTor.download' these descriptors are read from disk.
If a directory argument is provided and the file is already present these descriptors are read from disk.
If a directory argument is provided and the file is not present the file is downloaded this location then read.
If the file has neither been downloaded and no directory argument is provided then the file is downloaded to a temporary directory that's deleted after it is read.

Parameters:	directory (str) -- destination to download into descriptor_type (str) -- descriptor type, this is guessed if not provided start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with document_handler (stem.descriptor.__init__.DocumentHandler) -- method in which to parse a `NetworkStatusDocument` timeout (int) -- timeout when connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose
Returns:	iterator for `Descriptor` instances in the file
Raises :	ValueError if unable to determine the descirptor type TypeError if we cannot parse this descriptor type `DownloadFailed` if the download fails

download(directory, decompress=True, timeout=None, retries=3, overwrite=False)[source]¶

Downloads this file to the given location. If a file already exists this is a no-op.

Parameters:	directory (str) -- destination to download into decompress (bool) -- decompress written file timeout (int) -- timeout when connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose overwrite (bool) -- if this file exists but mismatches CollecTor's checksum then overwrites if True, otherwise rases an exception
Returns:	str with the path we downloaded to
Raises :	`DownloadFailed` if the download fails IOError if a mismatching file exists and overwrite is False

class stem.descriptor.collector.CollecTor(retries=2, timeout=None)[source]¶

Bases: object

Downloader for descriptors from CollecTor. The contents of CollecTor are provided in an index that's fetched as required.

Variables:	retries (int) -- number of times to attempt the request if downloading it fails timeout (float) -- duration before we'll time out our request

get_server_descriptors(start=None, end=None, cache_to=None, bridge=False, timeout=None, retries=3)[source]¶

Provides server descriptors published during the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded bridge (bool) -- standard descriptors if False, bridge if True timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of `ServerDescriptor` for the given time range
Raises :	`DownloadFailed` if the download fails

get_extrainfo_descriptors(start=None, end=None, cache_to=None, bridge=False, timeout=None, retries=3)[source]¶

Provides extrainfo descriptors published during the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded bridge (bool) -- standard descriptors if False, bridge if True timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of `RelayExtraInfoDescriptor` for the given time range
Raises :	`DownloadFailed` if the download fails

get_microdescriptors(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶

Provides microdescriptors estimated to be published during the given time range, sorted oldest to newest. Unlike server/extrainfo descriptors, microdescriptors change very infrequently...

"Microdescriptors are expected to be relatively static and only change
about once per week." -dir-spec section 3.3

CollecTor archives only contain microdescriptors that change, so hourly tarballs often contain very few. Microdescriptors also do not contain their publication timestamp, so this is estimated.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of :class:`~stem.descriptor.microdescriptor.Microdescriptor for the given time range
Raises :	`DownloadFailed` if the download fails

get_consensus(start=None, end=None, cache_to=None, document_handler='ENTRIES', version=3, microdescriptor=False, bridge=False, timeout=None, retries=3)[source]¶

Provides consensus router status entries published during the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded document_handler (stem.descriptor.__init__.DocumentHandler) -- method in which to parse a `NetworkStatusDocument` version (int) -- consensus variant to retrieve (versions 2 or 3) microdescriptor (bool) -- provides the microdescriptor consensus if True, standard consensus otherwise bridge (bool) -- standard descriptors if False, bridge if True timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of `RouterStatusEntry` for the given time range
Raises :	`DownloadFailed` if the download fails

get_key_certificates(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶

Directory authority key certificates for the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of :class:`~stem.descriptor.networkstatus.KeyCertificate for the given time range
Raises :	`DownloadFailed` if the download fails

get_bandwidth_files(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶

Bandwidth authority heuristics for the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of :class:`~stem.descriptor.bandwidth_file.BandwidthFile for the given time range
Raises :	`DownloadFailed` if the download fails

get_exit_lists(start=None, end=None, cache_to=None, timeout=None, retries=3)[source]¶

TorDNSEL exit lists for the given time range, sorted oldest to newest.

Parameters:	start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with cache_to (str) -- directory to cache archives into, if an archive is available here it is not downloaded timeout (int) -- timeout for downloading each individual archive when the connection becomes idle, no timeout applied if None retries (int) -- maximum attempts to impose on a per-archive basis
Returns:	iterator of :class:`~stem.descriptor.tordnsel.TorDNSEL for the given time range
Raises :	`DownloadFailed` if the download fails

index(compression='best')[source]¶

Provides the archives available in CollecTor.

Parameters:

Parameters:	compression (descriptor.Compression) -- compression type to download from, if undefiled we'll use the best decompression available
Returns:	dict with the archive contents
Raises :	If unable to retrieve the index this provide... ValueError if json is malformed IOError if unable to decompress `DownloadFailed` if the download fails

compression (descriptor.Compression) -- compression type to download from, if undefiled we'll use the best decompression available

Returns:

dict with the archive contents

Raises :

If unable to retrieve the index this provide...

ValueError if json is malformed

IOError if unable to decompress

DownloadFailed if the download fails

files(descriptor_type=None, start=None, end=None)[source]¶

Provides files CollecTor presently has, sorted oldest to newest.

Parameters:

Parameters:	descriptor_type (str) -- descriptor type or prefix to retrieve start (datetime.datetime) -- publication time to begin with end (datetime.datetime) -- publication time to end with
Returns:	list of `File`
Raises :	If unable to retrieve the index this provide... ValueError if json is malformed IOError if unable to decompress `DownloadFailed` if the download fails

descriptor_type (str) -- descriptor type or prefix to retrieve
start (datetime.datetime) -- publication time to begin with
end (datetime.datetime) -- publication time to end with

Returns:

list of File

Raises :

If unable to retrieve the index this provide...

ValueError if json is malformed

IOError if unable to decompress

DownloadFailed if the download fails

Stem Docs

CollecTor

CollecTor¶