Stem Docs

Descriptor

Descriptor

Package for parsing and processing descriptor data.

Module Overview:

parse_file - Parses the descriptors in a file.
create_signing_key - Cretes a signing key that can be used for creating descriptors.

Compression - method of descriptor decompression

Descriptor - Common parent for all descriptor file types.
  | |- content - creates the text of a new descriptor
  | |- create - creates a new descriptor
  | +- from_str - provides a parsed descriptor for the given string
  |
  |- type_annotation - provides our @type annotation
  |- get_path - location of the descriptor on disk if it came from a file
  |- get_archive_path - location of the descriptor within the archive it came from
  |- get_bytes - similar to str(), but provides our original bytes content
  |- get_unrecognized_lines - unparsed descriptor content
  +- __str__ - string that the descriptor was made from
stem.descriptor.__init__.DigestHash(enum)

New in version 1.8.0.

Hash function used by tor for descriptor digests.

DigestHash Description
SHA1 SHA1 hash
SHA256 SHA256 hash
stem.descriptor.__init__.DigestEncoding(enum)

New in version 1.8.0.

Encoding of descriptor digests.

DigestEncoding Description
RAW hash object
HEX uppercase hexidecimal encoding
BASE64 base64 encoding without trailing '=' padding
stem.descriptor.__init__.DocumentHandler(enum)

Ways in which we can parse a NetworkStatusDocument.

Both ENTRIES and BARE_DOCUMENT have a 'thin' document, which doesn't have a populated routers attribute. This allows for lower memory usage and upfront runtime. However, if read time and memory aren't a concern then DOCUMENT can provide you with a fully populated document.

Handlers don't change the fact that most methods that provide descriptors return an iterator. In the case of DOCUMENT and BARE_DOCUMENT that iterator would have just a single item - the document itself.

Simple way to handle this is to call next() to get the iterator's one and only value...

import stem.descriptor.remote
from stem.descriptor import DocumentHandler

consensus = next(stem.descriptor.remote.get_consensus(
  document_handler = DocumentHandler.BARE_DOCUMENT,
)
DocumentHandler Description
ENTRIES Iterates over the contained RouterStatusEntry. Each has a reference to the bare document it came from (through its document attribute).
DOCUMENT NetworkStatusDocument with the RouterStatusEntry it contains (through its routers attribute).
BARE_DOCUMENT NetworkStatusDocument without a reference to its contents (the RouterStatusEntry are unread).
stem.descriptor.__init__.parse_file(descriptor_file, descriptor_type=None, validate=False, document_handler='ENTRIES', normalize_newlines=None, **kwargs)[source]

Simple function to read the descriptor contents from a file, providing an iterator for its Descriptor contents.

If you don't provide a descriptor_type argument then this automatically tries to determine the descriptor type based on the following...

  • The @type annotation on the first line. These are generally only found in the CollecTor archives.
  • The filename if it matches something from tor's data directory. For instance, tor's 'cached-descriptors' contains server descriptors.

This is a handy function for simple usage, but if you're reading multiple descriptor files you might want to consider the DescriptorReader.

Descriptor types include the following, including further minor versions (ie. if we support 1.1 then we also support everything from 1.0 and most things from 1.2, but not 2.0)...

Descriptor Type Class
server-descriptor 1.0 RelayDescriptor
extra-info 1.0 RelayExtraInfoDescriptor
microdescriptor 1.0 Microdescriptor
directory 1.0 unsupported
network-status-2 1.0 RouterStatusEntryV2 (with a NetworkStatusDocumentV2)
dir-key-certificate-3 1.0 KeyCertificate
network-status-consensus-3 1.0 RouterStatusEntryV3 (with a NetworkStatusDocumentV3)
network-status-vote-3 1.0 RouterStatusEntryV3 (with a NetworkStatusDocumentV3)
network-status-microdesc-consensus-3 1.0 RouterStatusEntryMicroV3 (with a NetworkStatusDocumentV3)
bridge-network-status 1.0 RouterStatusEntryV3 (with a BridgeNetworkStatusDocument)
bridge-server-descriptor 1.0 BridgeDescriptor
bridge-extra-info 1.1 or 1.2 BridgeExtraInfoDescriptor
torperf 1.0 unsupported
bridge-pool-assignment 1.0 unsupported
tordnsel 1.0 TorDNSEL
hidden-service-descriptor 1.0 HiddenServiceDescriptorV2

If you're using python 3 then beware that the open() function defaults to using text mode. Binary mode is strongly suggested because it's both faster (by my testing by about 33x) and doesn't do universal newline translation which can make us misparse the document.

my_descriptor_file = open(descriptor_path, 'rb')
Parameters:
  • descriptor_file (str,file,tarfile) -- path or opened file with the descriptor contents
  • descriptor_type (str) -- descriptor type, this is guessed if not provided
  • validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise
  • document_handler (stem.descriptor.__init__.DocumentHandler) -- method in which to parse the NetworkStatusDocument
  • normalize_newlines (bool) -- converts windows newlines (CRLF), this is the default when reading data directories on windows
  • kwargs (dict) -- additional arguments for the descriptor constructor
Returns:

iterator for Descriptor instances in the file

Raises :
  • ValueError if the contents is malformed and validate is True
  • TypeError if we can't match the contents of the file to a descriptor type
  • IOError if unable to read from the descriptor_file
class stem.descriptor.__init__.Descriptor(contents, lazy_load=False)[source]

Bases: object

Common parent for all types of descriptors.

TYPE_ANNOTATION_NAME = None
classmethod from_str(content, **kwargs)[source]

Provides a Descriptor for the given content.

To parse a descriptor we must know its type. There are three ways to convey this...

# use a descriptor_type argument
desc = Descriptor.from_str(content, descriptor_type = 'server-descriptor 1.0')

# prefixing the content with a "@type" annotation
desc = Descriptor.from_str('@type server-descriptor 1.0\n' + content)

# use this method from a subclass
desc = stem.descriptor.server_descriptor.RelayDescriptor.from_str(content)

New in version 1.8.0.

Parameters:
  • content (str,bytes) -- string to construct the descriptor from
  • multiple (bool) -- if provided with True this provides a list of descriptors rather than a single one
  • kwargs (dict) -- additional arguments for parse_file()
Returns:

Descriptor subclass for the given content, or a list of descriptors if multiple = True is provided

Raises :
  • ValueError if the contents is malformed and validate is True
  • TypeError if we can't match the contents of the file to a descriptor type
  • IOError if unable to read from the descriptor_file
classmethod content(attr=None, exclude=(), sign=False)[source]

Creates descriptor content with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.

New in version 1.6.0.

Parameters:
  • attr (dict) -- keyword/value mappings to be included in the descriptor
  • exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
  • sign (bool) -- includes cryptographic signatures and digests if True
Returns:

str with the content of a descriptor

Raises :
  • ImportError if cryptography is unavailable and sign is True
  • NotImplementedError if not implemented for this descriptor type
classmethod create(attr=None, exclude=(), validate=True, sign=False)[source]

Creates a descriptor with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.

New in version 1.6.0.

Parameters:
  • attr (dict) -- keyword/value mappings to be included in the descriptor
  • exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
  • validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise
  • sign (bool) -- includes cryptographic signatures and digests if True
Returns:

Descriptor subclass

Raises :
  • ValueError if the contents is malformed and validate is True
  • ImportError if cryptography is unavailable and sign is True
  • NotImplementedError if not implemented for this descriptor type
type_annotation()[source]

Provides the Tor metrics annotation of this descriptor type. For example, "@type server-descriptor 1.0" for server descriptors.

Please note that the version number component is specific to CollecTor, and for the moment hardcode as 1.0. This may change in the future.

New in version 1.8.0.

Returns:TypeAnnotation with our type information
get_path()[source]

Provides the absolute path that we loaded this descriptor from.

Returns:str with the absolute path of the descriptor source
get_archive_path()[source]

If this descriptor came from an archive then provides its path within the archive. This is only set if the descriptor came from a DescriptorReader, and is None if this descriptor didn't come from an archive.

Returns:str with the descriptor's path within the archive
get_bytes()[source]

Provides the ASCII bytes of the descriptor. This only differs from str() if you're running python 3.x, in which case str() provides a unicode string.

Returns:bytes for the descriptor's contents
get_unrecognized_lines()[source]

Provides a list of lines that were either ignored or had data that we did not know how to process. This is most common due to new descriptor fields that this library does not yet know how to process. Patches welcome!

Returns:list of lines of unrecognized content