Descriptor¶
Package for parsing and processing descriptor data.
Module Overview:
parse_file - Parses the descriptors in a file.
create_signing_key - Cretes a signing key that can be used for creating descriptors.
Compression - method of descriptor decompression
Descriptor - Common parent for all descriptor file types.
| |- content - creates the text of a new descriptor
| |- create - creates a new descriptor
| +- from_str - provides a parsed descriptor for the given string
|
|- type_annotation - provides our @type annotation
|- get_path - location of the descriptor on disk if it came from a file
|- get_archive_path - location of the descriptor within the archive it came from
|- get_bytes - similar to str(), but provides our original bytes content
|- get_unrecognized_lines - unparsed descriptor content
+- __str__ - string that the descriptor was made from
- stem.descriptor.__init__.DigestHash(enum)¶
New in version 1.8.0.
Hash function used by tor for descriptor digests.
DigestHash Description SHA1 SHA1 hash SHA256 SHA256 hash
- stem.descriptor.__init__.DigestEncoding(enum)¶
New in version 1.8.0.
Encoding of descriptor digests.
DigestEncoding Description RAW hash object HEX uppercase hexidecimal encoding BASE64 base64 encoding without trailing '=' padding
- stem.descriptor.__init__.DocumentHandler(enum)¶
Ways in which we can parse a NetworkStatusDocument.
Both ENTRIES and BARE_DOCUMENT have a 'thin' document, which doesn't have a populated routers attribute. This allows for lower memory usage and upfront runtime. However, if read time and memory aren't a concern then DOCUMENT can provide you with a fully populated document.
Handlers don't change the fact that most methods that provide descriptors return an iterator. In the case of DOCUMENT and BARE_DOCUMENT that iterator would have just a single item - the document itself.
Simple way to handle this is to call next() to get the iterator's one and only value...
import stem.descriptor.remote from stem.descriptor import DocumentHandler consensus = next(stem.descriptor.remote.get_consensus( document_handler = DocumentHandler.BARE_DOCUMENT, )
DocumentHandler Description ENTRIES Iterates over the contained RouterStatusEntry. Each has a reference to the bare document it came from (through its document attribute). DOCUMENT NetworkStatusDocument with the RouterStatusEntry it contains (through its routers attribute). BARE_DOCUMENT NetworkStatusDocument without a reference to its contents (the RouterStatusEntry are unread).
- stem.descriptor.__init__.parse_file(descriptor_file, descriptor_type=None, validate=False, document_handler='ENTRIES', normalize_newlines=None, **kwargs)[source]¶
Simple function to read the descriptor contents from a file, providing an iterator for its Descriptor contents.
If you don't provide a descriptor_type argument then this automatically tries to determine the descriptor type based on the following...
- The @type annotation on the first line. These are generally only found in the CollecTor archives.
- The filename if it matches something from tor's data directory. For instance, tor's 'cached-descriptors' contains server descriptors.
This is a handy function for simple usage, but if you're reading multiple descriptor files you might want to consider the DescriptorReader.
Descriptor types include the following, including further minor versions (ie. if we support 1.1 then we also support everything from 1.0 and most things from 1.2, but not 2.0)...
Descriptor Type Class server-descriptor 1.0 RelayDescriptor extra-info 1.0 RelayExtraInfoDescriptor microdescriptor 1.0 Microdescriptor directory 1.0 unsupported network-status-2 1.0 RouterStatusEntryV2 (with a NetworkStatusDocumentV2) dir-key-certificate-3 1.0 KeyCertificate network-status-consensus-3 1.0 RouterStatusEntryV3 (with a NetworkStatusDocumentV3) network-status-vote-3 1.0 RouterStatusEntryV3 (with a NetworkStatusDocumentV3) network-status-microdesc-consensus-3 1.0 RouterStatusEntryMicroV3 (with a NetworkStatusDocumentV3) bridge-network-status 1.0 RouterStatusEntryV3 (with a BridgeNetworkStatusDocument) bridge-server-descriptor 1.0 BridgeDescriptor bridge-extra-info 1.1 or 1.2 BridgeExtraInfoDescriptor torperf 1.0 unsupported bridge-pool-assignment 1.0 unsupported tordnsel 1.0 TorDNSEL hidden-service-descriptor 1.0 HiddenServiceDescriptorV2 If you're using python 3 then beware that the open() function defaults to using text mode. Binary mode is strongly suggested because it's both faster (by my testing by about 33x) and doesn't do universal newline translation which can make us misparse the document.
my_descriptor_file = open(descriptor_path, 'rb')
Parameters: - descriptor_file (str,file,tarfile) -- path or opened file with the descriptor contents
- descriptor_type (str) -- descriptor type, this is guessed if not provided
- validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise
- document_handler (stem.descriptor.__init__.DocumentHandler) -- method in which to parse the NetworkStatusDocument
- normalize_newlines (bool) -- converts windows newlines (CRLF), this is the default when reading data directories on windows
- kwargs (dict) -- additional arguments for the descriptor constructor
Returns: iterator for Descriptor instances in the file
Raises : - ValueError if the contents is malformed and validate is True
- TypeError if we can't match the contents of the file to a descriptor type
- IOError if unable to read from the descriptor_file
- class stem.descriptor.__init__.Descriptor(contents, lazy_load=False)[source]¶
Bases: object
Common parent for all types of descriptors.
- TYPE_ANNOTATION_NAME = None¶
- classmethod from_str(content, **kwargs)[source]¶
Provides a Descriptor for the given content.
To parse a descriptor we must know its type. There are three ways to convey this...
# use a descriptor_type argument desc = Descriptor.from_str(content, descriptor_type = 'server-descriptor 1.0') # prefixing the content with a "@type" annotation desc = Descriptor.from_str('@type server-descriptor 1.0\n' + content) # use this method from a subclass desc = stem.descriptor.server_descriptor.RelayDescriptor.from_str(content)
New in version 1.8.0.
Parameters: - content (str,bytes) -- string to construct the descriptor from
- multiple (bool) -- if provided with True this provides a list of descriptors rather than a single one
- kwargs (dict) -- additional arguments for parse_file()
Returns: Descriptor subclass for the given content, or a list of descriptors if multiple = True is provided
Raises : - ValueError if the contents is malformed and validate is True
- TypeError if we can't match the contents of the file to a descriptor type
- IOError if unable to read from the descriptor_file
- classmethod content(attr=None, exclude=(), sign=False)[source]¶
Creates descriptor content with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.
New in version 1.6.0.
Parameters: - attr (dict) -- keyword/value mappings to be included in the descriptor
- exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
- sign (bool) -- includes cryptographic signatures and digests if True
Returns: str with the content of a descriptor
Raises : - ImportError if cryptography is unavailable and sign is True
- NotImplementedError if not implemented for this descriptor type
- classmethod create(attr=None, exclude=(), validate=True, sign=False)[source]¶
Creates a descriptor with the given attributes. Mandatory fields are filled with dummy information unless data is supplied. This doesn't yet create a valid signature.
New in version 1.6.0.
Parameters: - attr (dict) -- keyword/value mappings to be included in the descriptor
- exclude (list) -- mandatory keywords to exclude from the descriptor, this results in an invalid descriptor
- validate (bool) -- checks the validity of the descriptor's content if True, skips these checks otherwise
- sign (bool) -- includes cryptographic signatures and digests if True
Returns: Descriptor subclass
Raises : - ValueError if the contents is malformed and validate is True
- ImportError if cryptography is unavailable and sign is True
- NotImplementedError if not implemented for this descriptor type
- type_annotation()[source]¶
Provides the Tor metrics annotation of this descriptor type. For example, "@type server-descriptor 1.0" for server descriptors.
Please note that the version number component is specific to CollecTor, and for the moment hardcode as 1.0. This may change in the future.
New in version 1.8.0.
Returns: TypeAnnotation with our type information
- get_path()[source]¶
Provides the absolute path that we loaded this descriptor from.
Returns: str with the absolute path of the descriptor source
- get_archive_path()[source]¶
If this descriptor came from an archive then provides its path within the archive. This is only set if the descriptor came from a DescriptorReader, and is None if this descriptor didn't come from an archive.
Returns: str with the descriptor's path within the archive