Descriptor¶

Package for parsing and processing descriptor data.

Module Overview:

parse_file - Parses the descriptors in a file.
create_signing_key - Cretes a signing key that can be used for creating descriptors.

Compression - method of descriptor decompression

Descriptor - Common parent for all descriptor file types.
  | |- content - creates the text of a new descriptor
  | |- create - creates a new descriptor
  | +- from_str - provides a parsed descriptor for the given string
  |
  |- type_annotation - provides our @type annotation
  |- get_path - location of the descriptor on disk if it came from a file
  |- get_archive_path - location of the descriptor within the archive it came from
  |- get_bytes - similar to str(), but provides our original bytes content
  |- get_unrecognized_lines - unparsed descriptor content
  +- __str__ - string that the descriptor was made from

stem.descriptor.__init__.DigestHash(enum)¶

New in version 1.8.0.

Hash function used by tor for descriptor digests.

DigestHash	Description
SHA1	SHA1 hash
SHA256	SHA256 hash

stem.descriptor.__init__.DigestEncoding(enum)¶

New in version 1.8.0.

Encoding of descriptor digests.

DigestEncoding	Description
RAW	hash object
HEX	uppercase hexidecimal encoding
BASE64	base64 encoding without trailing '=' padding

stem.descriptor.__init__.DocumentHandler(enum)¶

Ways in which we can parse a NetworkStatusDocument.

Both ENTRIES and BARE_DOCUMENT have a 'thin' document, which doesn't have a populated routers attribute. This allows for lower memory usage and upfront runtime. However, if read time and memory aren't a concern then DOCUMENT can provide you with a fully populated document.

Handlers don't change the fact that most methods that provide descriptors return an iterator. In the case of DOCUMENT and BARE_DOCUMENT that iterator would have just a single item - the document itself.

Simple way to handle this is to call next() to get the iterator's one and only value...

import stem.descriptor.remote
from stem.descriptor import DocumentHandler

consensus = next(stem.descriptor.remote.get_consensus(
  document_handler = DocumentHandler.BARE_DOCUMENT,
)

DocumentHandler	Description
ENTRIES	Iterates over the contained `RouterStatusEntry`. Each has a reference to the bare document it came from (through its document attribute).
DOCUMENT	`NetworkStatusDocument` with the `RouterStatusEntry` it contains (through its routers attribute).
BARE_DOCUMENT	`NetworkStatusDocument` without a reference to its contents (the `RouterStatusEntry` are unread).

stem.descriptor.__init__.parse_file(descriptor_file, descriptor_type=None, validate=False, document_handler='ENTRIES', normalize_newlines=None, **kwargs)[source]¶

Simple function to read the descriptor contents from a file, providing an iterator for its Descriptor contents.

If you don't provide a descriptor_type argument then this automatically tries to determine the descriptor type based on the following...

The @type annotation on the first line. These are generally only found in the CollecTor archives.
The filename if it matches something from tor's data directory. For instance, tor's 'cached-descriptors' contains server descriptors.

This is a handy function for simple usage, but if you're reading multiple descriptor files you might want to consider the DescriptorReader.

Descriptor types include the following, including further minor versions (ie. if we support 1.1 then we also support everything from 1.0 and most things from 1.2, but not 2.0)...

Descriptor Type	Class
server-descriptor 1.0	`RelayDescriptor`
extra-info 1.0	`RelayExtraInfoDescriptor`
microdescriptor 1.0	`Microdescriptor`
directory 1.0	unsupported
network-status-2 1.0	`RouterStatusEntryV2` (with a `NetworkStatusDocumentV2`)
dir-key-certificate-3 1.0	`KeyCertificate`
network-status-consensus-3 1.0	`RouterStatusEntryV3` (with a `NetworkStatusDocumentV3`)
network-status-vote-3 1.0	`RouterStatusEntryV3` (with a `NetworkStatusDocumentV3`)
network-status-microdesc-consensus-3 1.0	`RouterStatusEntryMicroV3` (with a `NetworkStatusDocumentV3`)
bridge-network-status 1.0	`RouterStatusEntryV3` (with a `BridgeNetworkStatusDocument`)
bridge-server-descriptor 1.0	`BridgeDescriptor`
bridge-extra-info 1.1 or 1.2	`BridgeExtraInfoDescriptor`
torperf 1.0	unsupported
bridge-pool-assignment 1.0	unsupported
tordnsel 1.0	`TorDNSEL`
hidden-service-descriptor 1.0	`HiddenServiceDescriptorV2`

If you're using python 3 then beware that the open() function defaults to using text mode. Binary mode is strongly suggested because it's both faster (by my testing by about 33x) and doesn't do universal newline translation which can make us misparse the document.

my_descriptor_file = open(descriptor_path, 'rb')

Parameters:	descriptor_file (str,file,tarfile) -- path or opened file with the descriptor contents descriptor_type (str) -- descriptor type, this is guessed if not provided validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise document_handler (stem.descriptor.__init__.DocumentHandler) -- method in which to parse the `NetworkStatusDocument` normalize_newlines (bool) -- converts windows newlines (CRLF), this is the default when reading data directories on windows kwargs (dict) -- additional arguments for the descriptor constructor
Returns:	iterator for `Descriptor` instances in the file
Raises :	ValueError if the contents is malformed and validate is True TypeError if we can't match the contents of the file to a descriptor type IOError if unable to read from the descriptor_file

class stem.descriptor.__init__.Descriptor(contents, lazy_load=False)[source]¶

Bases: object

Common parent for all types of descriptors.

TYPE_ANNOTATION_NAME = None¶

classmethod from_str(content, **kwargs)[source]¶

Provides a Descriptor for the given content.

To parse a descriptor we must know its type. There are three ways to convey this...

# use a descriptor_type argument
desc = Descriptor.from_str(content, descriptor_type = 'server-descriptor 1.0')

# prefixing the content with a "@type" annotation
desc = Descriptor.from_str('@type server-descriptor 1.0\n' + content)

# use this method from a subclass
desc = stem.descriptor.server_descriptor.RelayDescriptor.from_str(content)

New in version 1.8.0.

Parameters:	content (str,bytes) -- string to construct the descriptor from multiple (bool) -- if provided with True this provides a list of descriptors rather than a single one kwargs (dict) -- additional arguments for `parse_file()`
Returns:	`Descriptor` subclass for the given content, or a list of descriptors if multiple = True is provided
Raises :	ValueError if the contents is malformed and validate is True TypeError if we can't match the contents of the file to a descriptor type IOError if unable to read from the descriptor_file

classmethod content(attr=None, exclude=(), sign=False)[source]¶

Creates descriptor content with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.

New in version 1.6.0.

Parameters:	attr (dict) -- keyword/value mappings to be included in the descriptor exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor sign (bool) -- includes cryptographic signatures and digests if True
Returns:	str with the content of a descriptor
Raises :	ImportError if cryptography is unavailable and sign is True NotImplementedError if not implemented for this descriptor type

classmethod create(attr=None, exclude=(), validate=True, sign=False)[source]¶

Creates a descriptor with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.

New in version 1.6.0.

Parameters:	attr (dict) -- keyword/value mappings to be included in the descriptor exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise sign (bool) -- includes cryptographic signatures and digests if True
Returns:	`Descriptor` subclass
Raises :	ValueError if the contents is malformed and validate is True ImportError if cryptography is unavailable and sign is True NotImplementedError if not implemented for this descriptor type

type_annotation()[source]¶

Provides the Tor metrics annotation of this descriptor type. For example, "@type server-descriptor 1.0" for server descriptors.

Please note that the version number component is specific to CollecTor, and for the moment hardcode as 1.0. This may change in the future.

New in version 1.8.0.

Returns:	`TypeAnnotation` with our type information

get_path()[source]¶

Provides the absolute path that we loaded this descriptor from.

Returns:	str with the absolute path of the descriptor source

get_archive_path()[source]¶

If this descriptor came from an archive then provides its path within the archive. This is only set if the descriptor came from a DescriptorReader, and is None if this descriptor didn't come from an archive.

Returns:	str with the descriptor's path within the archive

get_bytes()[source]¶

Provides the ASCII bytes of the descriptor. This only differs from str() if you're running python 3.x, in which case str() provides a unicode string.

Returns:	bytes for the descriptor's contents

get_unrecognized_lines()[source]¶

Provides a list of lines that were either ignored or had data that we did not know how to process. This is most common due to new descriptor fields that this library does not yet know how to process. Patches welcome!

Returns:	list of lines of unrecognized content

Stem Docs

Descriptor

Descriptor¶