woohoo_pdns package¶
Subpackages¶
Submodules¶
woohoo_pdns.load module¶
-
class
woohoo_pdns.load.
DNSLogFileImporter
(source_name, **kwargs)[source]¶ Bases:
woohoo_pdns.load.FileImporter
Importer capable of reading a different source file format (JSON based).
-
__init__
(source_name, **kwargs)[source]¶ To correctly initialise a file source a config dict must be supplied (see ‘cfg’ argument documentation).
- Parameters
source_name (str) – Either the name of a file to be read or the name of a directory to scan for files to load.
cfg (dict) – A config dictionary that contains the following two keys: * file_pattern (str): The glob pattern to use when reading files from a directory. * rename (bool): Whether or not files should be renamed (by appending ‘.1’) after they are read.
-
__module__
= 'woohoo_pdns.load'¶
-
_parse_tokenised_record
(tokenised_rec)[source]¶ Convert unix timestamps into aware datetime objects and convert string-type rrtype into their integer based pendants.
- Parameters
tokenised_rec (record_data) – The record_data as tokenised by
_tokenise_record()
- Returns
A single element list of record_data named tuple.
-
_tokenise_record
(rec)[source]¶ Split a line into tokens:
{"rrclass": "IN", "ttl": 3600, "timestamp": "1562845812", "rrtype": "PTR", "rrname": "24.227.156.213.in-addr.arpa.", "rdata": "mx2.mammut.ch.", "sensor": 37690}
becomes:
tokens[0] = "1562845812" # first_seen tokens[1] = "1562845812" # last_seen tokens[2] = "PTR" # DNS type tokens[3] = "24.227.156.213.in-addr.arpa." # rrname tokens[4] = "1" # hitcount tokens[5] = "mx2.mammut.ch." # rdata
respectively:
entry.first_seen entry.last_seen entry.rrtype entry.rrname entry.hitcount entry.rdata
- Parameters
rec (str) – A record returned from the source object.
- Returns
A single entry list of record_data named tuple.
-
-
class
woohoo_pdns.load.
DNSTapFileImporter
(source_name, **kwargs)[source]¶ Bases:
woohoo_pdns.load.FileImporter
An importer capable of reading YAML based dnstap log files.
-
__init__
(source_name, **kwargs)[source]¶ To correctly initialise a file source a config dict must be supplied (see ‘cfg’ argument documentation).
- Parameters
source_name (str) – Either the name of a file to be read or the name of a directory to scan for files to load.
cfg (dict) – A config dictionary that contains the following two keys: * file_pattern (str): The glob pattern to use when reading files from a directory. * rename (bool): Whether or not files should be renamed (by appending ‘.1’) after they are read.
-
__module__
= 'woohoo_pdns.load'¶
-
_parse_tokenised_record
(tokenised_rec)[source]¶ Loop through all answers in the record and turn the datetimes into aware objects (using the default timezone).
- Parameters
tokenised_rec (record_data) – The record_data as tokenised by
_tokenise_record()
- Returns
A list of record_data named tuple.
-
_tokenise_record
(rec)[source]¶ Extract from YAML document:
type: MESSAGE identity: dns.host.example.com version: BIND 9.11.3-RedHat-9.11.3-6.el7.centos message: type: RESOLVER_RESPONSE message_size: 89b socket_family: INET6 socket_protocol: UDP query_address: 203.0.113.56 response_address: 203.0.113.53 query_port: 49824 response_port: 53 response_message_data: opcode: QUERY status: NOERROR id: 44174 flags: qr aa QUESTION: 1 ANSWER: 2 AUTHORITY: 0 ADDITIONAL: 0 QUESTION_SECTION: - clients6.google.com. IN AAAA ANSWER_SECTION: - clients6.google.com. 300 IN CNAME clients.l.google.com. - clients.l.google.com. 300 IN AAAA 2a00:1450:4002:807::200e response_message: | ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44174 ;; flags: qr aa ; QUESTION: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;clients6.google.com. IN AAAA ;; ANSWER SECTION: clients6.google.com. 300 IN CNAME clients.l.google.com. clients.l.google.com. 300 IN AAAA 2a00:1450:4002:807::200e
becomes:
tokens[0] = "2018-06-18T19:22:56Z" # first_seen tokens[1] = "2018-06-18T19:22:56Z" # last_seen tokens[2] = "CNAME" # DNS type tokens[3] = "clients6.google.com." # rrname tokens[4] = "1" # hitcount tokens[5] = "clients.l.google.com." # rdata
respectively:
entry.first_seen entry.last_seen entry.rrtype entry.rrname entry.hitcount entry.rdata
- Parameters
rec (str) – A record (YAML document as string) returned from the source object.
- Returns
A list of record_data named tuple.
-
-
class
woohoo_pdns.load.
FileImporter
(source_name, cfg=None, **kwargs)[source]¶ Bases:
woohoo_pdns.load.Importer
An abstract class to handle loading data from files.
The ‘source_name’ can be a filename or a directory name on disk. If it is a file name, that file will be read. If it is a directory, all files matching the glob pattern in cfg[“file_pattern”] will be read. An exception will be thrown if the file (or directory) does not exist).
Note
Errors will be written to a file called like the source file, but with ‘_err’ in the name (if ‘source_name’ is a file) or to a file in the parent directory of the directory to load files from, also with an ‘_err’ in the name, if ‘source_name’ is a directory.
- Throws:
FileNotFoundException if ‘source_name’ does not exist.
-
__init__
(source_name, cfg=None, **kwargs)[source]¶ To correctly initialise a file source a config dict must be supplied (see ‘cfg’ argument documentation).
- Parameters
source_name (str) – Either the name of a file to be read or the name of a directory to scan for files to load.
cfg (dict) – A config dictionary that contains the following two keys: * file_pattern (str): The glob pattern to use when reading files from a directory. * rename (bool): Whether or not files should be renamed (by appending ‘.1’) after they are read.
-
__module__
= 'woohoo_pdns.load'¶
-
_parse_tokenised_record
(tokenised_rec)[source]¶ After a record was tokenised, it is passed to this method for parsing (e.g. turn a unix timestamp into a datetime, or similar).
Note
This method must be implemened by the concrete subclasses of
Importer
.- Parameters
tokenised_rec (record_data) – The record as tokenised by
_tokenise_record()
.- Returns
A record_data named tuple representing the final record to load. The importer also works if this method or returns None (i.e. nothing is loaded in to the database and the loading process continues) but the record is still considered ‘loaded’ by the statistics.
-
_tokenise_record
(rec)[source]¶ After a raw record is read from the source object, this method is called and passed the raw record to split it into the parts required for a pDNS record.
Note
This method must be implemened by the concrete subclasses of
Importer
.- Parameters
rec (str) – The record as it was returned by the source object. This string must now be splitted into the
parts of a record_data named tuple. (different) –
- Returns
A record_data named tuple that represents the record to load or None if the record could not be parsed (or should be ignored).
-
class
woohoo_pdns.load.
FileSource
(config, **kwargs)[source]¶ Bases:
woohoo_pdns.load.Source
A source that reads data from files on disk.
This source can either read a single file or scan a directory for files that match a glob pattern and process all matching files from the given directory. If the
filename
passed in is a file, this file will be processed. Iffilename
is a directory, the glob pattern infile_pattern
will be used to find files to process in that directory.If the optional configuration option
rename
is set to true (the default),RENAME_APPENDIX
will be appended to the current file name after processing.Note
The config dictionary (
config
) must contain the following keys:filename
And the following keys are optional in the
config
dictionary:file_pattern
rename
-
RENAME_APPENDIX
= '1'¶ If files should be renamed after processing, this is what is appended to the current filename.
-
__init__
(config, **kwargs)[source]¶ - Parameters
config (dict) – A dictionary that can hold data the source requires to configure itself.
kwargs (kwargs) – These are mainly to make this a “cooperative class” according to super() considered super.
-
__module__
= 'woohoo_pdns.load'¶
-
_open_next_file
()[source]¶ Try to open the next file to process.
First, the currently open file will be closed and renamed, if requested. After this, the next file in the list is opened (if any).
- Raises
IndexError –
- Returns
Nothing.
-
get_next_record
()[source]¶ This method is called by the importer whenever it is ready to load the next record. What is returned will be passed into
Importer._tokenise_record()
.Note
Subclasses must implement this method as it is not implemented here.
- Returns
A raw record from the source as string.
-
property
state
¶ Return a dictionary that describes the current state of the source.
The setter of this property expects a dictionary that was created by this getter and then restores the state of the source to what it was when
state
was retrieved.- Returns
A dictionary containing the current file list
file_list
(list of all files pending processing, excluding the current file), the name of the currently being processed filefile_name
and the offset (index) into the currently being processed file (as retrieved bytell()
).
-
class
woohoo_pdns.load.
Importer
(source_name, data_timezone='UTC', strict=False, **kwargs)[source]¶ Bases:
object
Importers are used to import new data into the pDNS database.
This is the super class for all importers. Different importers can import data from different sources. If no importer for a specific source is available, woohoo pDNS tries to make it simple to write a new importer for that particular source (format).
The main method of an importer is
load_batch()
. This method reads up to ‘batch_size’ records from the source, processes them into a list of record_data named tuples, adds some statistics and returns it.To access the source data it uses a
Source
object. This object’s job is to provide a single source record at a time to the importer. This can mean reading one or several lines from a file or a record from a Kafka topic or whatever produces a source record. The importer then processes this record (possibly into multiple entries, for example if the source record contained a single query that produced multiple answers).This base class handles the fetching of records from the source (up to a maximum of batch_size), calling the respective hooks (
_inspect_raw_record()
,_inspect_tokenised_record()
,_tokenise_record()
and_parse_tokenised_record()
) which implement the actual logic for the importer (i.e. these are the methods that must be overridden in the child classes), minimal cleansing of the data and handling errors (including writing an error logfile).-
IGNORE_TYPES
= [0]¶ DNS types that we want to ignore completely (0 for example does not exist)
-
ILLEGAL_CHARS
= ['/', '\\', '&', ':']¶ If any of these characters is present in
rname
the record will not be loaded as these characters are not expected inrrname
(they can, however, be present inrdata
, for example in TXT records).
-
__dict__
= mappingproxy({'__module__': 'woohoo_pdns.load', '__doc__': "\n Importers are used to import new data into the pDNS database.\n\n This is the super class for all importers. Different importers can import data from different sources. If no\n importer for a specific source is available, woohoo pDNS tries to make it simple to write a new importer for that\n particular source (format).\n\n The main method of an importer is :meth:`load_batch`. This method reads up to 'batch_size' records from the source,\n processes them into a list of record_data named tuples, adds some statistics and returns it.\n\n To access the source data it uses a :class:`Source` object. This object's job is to provide a single source record\n at a time to the importer. This can mean reading one or several lines from a file or a record from a Kafka topic or\n whatever produces a source record. The importer then processes this record (possibly into multiple entries, for\n example if the source record contained a single query that produced multiple answers).\n\n This base class handles the fetching of records from the source (up to a maximum of batch_size), calling the\n respective hooks (:meth:`_inspect_raw_record`, :meth:`_inspect_tokenised_record`, :meth:`_tokenise_record` and\n :meth:`_parse_tokenised_record`) which implement the actual logic for the importer (i.e. these are the methods that\n must be overridden in the child classes), minimal cleansing of the data and handling errors (including writing an\n error logfile).\n ", 'load_batch_result': <class 'woohoo_pdns.load.load_batch_result'>, 'ILLEGAL_CHARS': ['/', '\\', '&', ':'], 'IGNORE_TYPES': [0], '__init__': <function Importer.__init__>, 'has_more_data': <property object>, 'load_batch': <function Importer.load_batch>, '_is_valid': <function Importer._is_valid>, '_inspect_raw_record': <function Importer._inspect_raw_record>, '_inspect_tokenised_record': <function Importer._inspect_tokenised_record>, '_tokenise_record': <function Importer._tokenise_record>, '_parse_tokenised_record': <function Importer._parse_tokenised_record>, '__dict__': <attribute '__dict__' of 'Importer' objects>, '__weakref__': <attribute '__weakref__' of 'Importer' objects>})¶
-
__init__
(source_name, data_timezone='UTC', strict=False, **kwargs)[source]¶ Constructor for an importer.
- Parameters
source_name (str) – A name that is passed to the source; can be a file name or directory name for a
FileSource
or, for a hypothetical KafkaSource, it could be the name of the Kafka topic to use.data_timezone (str) – The name of the timezone that should be used if the source data does not provide the timezone for the dates and times (first_seen, last_seen).
strict (bool) – If set to true, the importer will throw an exception if something ‘odd’ is encountered in in the source data. If it is set to false, the importer will write an entry in the error log and continue loading data.
kwargs (kwargs) – These are mainly to make this a “cooperative class” according to super() considered super.
-
__module__
= 'woohoo_pdns.load'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
_inspect_raw_record
(raw_record)[source]¶ For the first record of every batch this method will be called and the raw record is passed to it. This can be used if ‘something’ must be determined from source data (e.g. the datetime format).
Note
This is a NOP in
Importer
and meant to be overridden by subclasses if required.- Parameters
raw_record (str) – the record as it was returned from the source object.
- Returns
Nothing.
-
_inspect_tokenised_record
(tokenised_rec)[source]¶ For every record that was successfully tokenised (i.e. splitted into the required parts), this method will be called. Can be used to decide on further processing for example.
Note
This is a NOP in
Importer
and meant to be overridden by subclasses if required.- Parameters
tokenised_rec (record_data) – The record as it was tokenised by
_tokenise_record()
.- Returns
Nothing.
-
_is_valid
(entry)[source]¶ Check if the given entry is considered to be valid.
Entries with an empty rrname or rdata field are considered invalid, for example.
- Parameters
entry (record_data) – the entry to check for validity.
- Returns
True if the entry passed validation, False otherwise.
-
_parse_tokenised_record
(tokenised_rec)[source]¶ After a record was tokenised, it is passed to this method for parsing (e.g. turn a unix timestamp into a datetime, or similar).
Note
This method must be implemened by the concrete subclasses of
Importer
.- Parameters
tokenised_rec (record_data) – The record as tokenised by
_tokenise_record()
.- Returns
A record_data named tuple representing the final record to load. The importer also works if this method or returns None (i.e. nothing is loaded in to the database and the loading process continues) but the record is still considered ‘loaded’ by the statistics.
-
_tokenise_record
(rec)[source]¶ After a raw record is read from the source object, this method is called and passed the raw record to split it into the parts required for a pDNS record.
Note
This method must be implemened by the concrete subclasses of
Importer
.- Parameters
rec (str) – The record as it was returned by the source object. This string must now be splitted into the
parts of a record_data named tuple. (different) –
- Returns
A record_data named tuple that represents the record to load or None if the record could not be parsed (or should be ignored).
-
property
has_more_data
¶ Indicating if the importer is (potentially) able to produce more data. Mainly means that the source can fetch at least one more record; does not include any validity check(s) of that data though.
- Returns
True if there is more source data available, false otherwise.
-
load_batch
(batch_size, max_failed_inarow=0)[source]¶ The workhorse method of
Importer
.The source object (self.source) will be initialised with its config (self.src_config) and for subsequent iterations the source’s state will be restored (to what was returned by
Source.state
in the last iteration). Then, records will be loaded until either no more data is available or ‘batch_size’ records are ready for loading into the database.For the first record in every batch,
_inspect_raw_record()
and_inspect_tokenised_record()
will be called. For every record_tokenise_record()
and_parse_tokenised_record()
are called._tokenise_record()
is meant to be the place where filtering of source records can occur (return None).- Parameters
batch_size (int) – The maximum number of records to process at once.
max_failed_inarow (int) – The maximum number of records that fail to import in a row before aborting the processing of this batch.
- Returns
A load_batch_result named tuple. This contains some statistics and a list of record_data named tuples.
-
class
load_batch_result
(converted, loaded, ignored, records)¶ Bases:
tuple
A named tuple that is used to pass back some statistics as well as a list of
record_data
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
__module__
= 'woohoo_pdns.load'¶
-
static
__new__
(_cls, converted, loaded, ignored, records)¶ Create new instance of load_batch_result(converted, loaded, ignored, records)
-
__repr__
()¶ Return a nicely formatted representation string
-
__slots__
= ()¶
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
_fields
= ('converted', 'loaded', 'ignored', 'records')¶
-
_fields_defaults
= {}¶
-
classmethod
_make
(iterable)¶ Make a new load_batch_result object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new load_batch_result object replacing specified fields with new values
-
property
converted
¶ Alias for field number 0
-
property
ignored
¶ Alias for field number 2
-
property
loaded
¶ Alias for field number 1
-
property
records
¶ Alias for field number 3
-
-
-
class
woohoo_pdns.load.
SilkFileImporter
(source_name, **kwargs)[source]¶ Bases:
woohoo_pdns.load.FileImporter
Importer to read files produced by the SiLK security suite.
Note
This is a subclass of
FileImporter
as it reads data from files on disk. There are many ways to get the files, for example with the ‘rwsender’ program included in the SiLK suite.-
__init__
(source_name, **kwargs)[source]¶ To correctly initialise a file source a config dict must be supplied (see ‘cfg’ argument documentation).
- Parameters
source_name (str) – Either the name of a file to be read or the name of a directory to scan for files to load.
cfg (dict) – A config dictionary that contains the following two keys: * file_pattern (str): The glob pattern to use when reading files from a directory. * rename (bool): Whether or not files should be renamed (by appending ‘.1’) after they are read.
-
__module__
= 'woohoo_pdns.load'¶
-
_inspect_tokenised_record
(tokenised_rec)[source]¶ Sometimes, the time in the input as millisecond resolution (for the whole source file). If so, adjust the parsing format to account for this.
- Parameters
tokenised_rec (record_data) – The record as tokenised by
_tokenise_record()
.- Returns
Nothing.
-
_parse_tokenised_record
(tokenised_rec)[source]¶ Mainly convert the date and time (strings) into aware datetime objects.
- Parameters
tokenised_rec (record_data) – The record_data as tokenised by
_tokenise_record()
- Returns
A single element list of record_data named tuple.
-
_tokenise_record
(rec)[source]¶ Split a line into tokens:
2019-05-13 18:12:44.374|2019-05-13 18:12:44.374|28|gateway.fe.apple-dns.net|1|2a01:b740:0a41:0603::0010
becomes:
tokens[0] = "2019-05-13 18:12:44.374" # first_seen tokens[1] = "2019-05-13 18:12:44.374" # last_seen tokens[2] = "28" # DNS type tokens[3] = "gateway.fe.apple-dns.net" # rrname tokens[4] = "1" # hitcount tokens[5] = "2a01:b740:0a41:0603::0010" # rdata
respectively:
entry.first_seen entry.last_seen entry.rrtype entry.rrname entry.hitcount entry.rdata
- Parameters
rec (str) – A record returned from the source object.
- Returns
A single entry list of record_data named tuple.
-
-
class
woohoo_pdns.load.
SingleLineFileSource
(config, **kwargs)[source]¶ Bases:
woohoo_pdns.load.FileSource
A file source that reads a single line from a file at a time.
-
__module__
= 'woohoo_pdns.load'¶
-
-
class
woohoo_pdns.load.
Source
(config, **kwargs)[source]¶ Bases:
object
Source object(s) abstract the logic of fetching a ‘single record’ from a source.
For files, this can mean reading one or several lines (e.g. a YAML document), for other sources (e.g. an imaginary Kafka source) this could mean querying a service or calling an API or …
-
__dict__
= mappingproxy({'__module__': 'woohoo_pdns.load', '__doc__': "\n Source object(s) abstract the logic of fetching a 'single record' from a source.\n\n For files, this can mean reading one or several lines (e.g. a YAML document), for other sources (e.g. an imaginary\n Kafka source) this could mean querying a service or calling an API or ...\n ", '__init__': <function Source.__init__>, '__enter__': <function Source.__enter__>, '__exit__': <function Source.__exit__>, 'state': <property object>, 'get_next_record': <function Source.get_next_record>, '__dict__': <attribute '__dict__' of 'Source' objects>, '__weakref__': <attribute '__weakref__' of 'Source' objects>})¶
-
__init__
(config, **kwargs)[source]¶ - Parameters
config (dict) – A dictionary that can hold data the source requires to configure itself.
kwargs (kwargs) – These are mainly to make this a “cooperative class” according to super() considered super.
-
__module__
= 'woohoo_pdns.load'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_next_record
()[source]¶ This method is called by the importer whenever it is ready to load the next record. What is returned will be passed into
Importer._tokenise_record()
.Note
Subclasses must implement this method as it is not implemented here.
- Returns
A raw record from the source as string.
-
property
state
¶ A source can have ‘state’ which allows it to resume at the correct next record after a batch of data was processed.
Note
Importers will request state from the source when a batch is about to be finished and will pass whatever the source provided back to the source before starting the next batch.
For a source reading from a file this can for example mean to return the value of
tell()
and thenseek()
to this position when state is passed in again.
-
-
exception
woohoo_pdns.load.
WoohooImportError
[source]¶ Bases:
Exception
-
__module__
= 'woohoo_pdns.load'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
woohoo_pdns.load.
YamlFileSource
(config, **kwargs)[source]¶ Bases:
woohoo_pdns.load.FileSource
Read a YAML document from a file on disk.
-
__module__
= 'woohoo_pdns.load'¶
-
get_next_record
()[source]¶ This method is called by the importer whenever it is ready to load the next record. What is returned will be passed into
Importer._tokenise_record()
.Note
Subclasses must implement this method as it is not implemented here.
- Returns
A raw record from the source as string.
-
woohoo_pdns.meta module¶
-
class
woohoo_pdns.meta.
LookupDict
(name=None)[source]¶ Bases:
dict
Dictionary lookup object.
TODO: understand this… https://github.com/kennethreitz/requests/blob/master/requests/structures.py
-
__dict__
= mappingproxy({'__module__': 'woohoo_pdns.meta', '__doc__': '\n Dictionary lookup object.\n\n TODO: understand this...\n https://github.com/kennethreitz/requests/blob/master/requests/structures.py\n ', '__init__': <function LookupDict.__init__>, '__repr__': <function LookupDict.__repr__>, '__getitem__': <function LookupDict.__getitem__>, 'get': <function LookupDict.get>, '__dict__': <attribute '__dict__' of 'LookupDict' objects>, '__weakref__': <attribute '__weakref__' of 'LookupDict' objects>})¶
-
__module__
= 'woohoo_pdns.meta'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
woohoo_pdns.pdns module¶
-
class
woohoo_pdns.pdns.
Database
(db_url)[source]¶ Bases:
object
The Database object is the interface to the database holding pDNS records.
This object is designed as a context manager, it can be used with
with
.-
__dict__
= mappingproxy({'__module__': 'woohoo_pdns.pdns', '__doc__': '\n The Database object is the interface to the database holding pDNS records.\n\n This object is designed as a context manager, it can be used with ``with``.\n ', '__init__': <function Database.__init__>, '__enter__': <function Database.__enter__>, '__exit__': <function Database.__exit__>, 'close': <function Database.close>, 'records': <property object>, 'count': <property object>, 'most_recent': <property object>, 'query': <function Database.query>, 'add_record': <function Database.add_record>, 'find_record': <function Database.find_record>, '_query_for_name': <function Database._query_for_name>, '_query_for_ip': <function Database._query_for_ip>, 'load': <function Database.load>, '__dict__': <attribute '__dict__' of 'Database' objects>, '__weakref__': <attribute '__weakref__' of 'Database' objects>})¶
-
__init__
(db_url)[source]¶ Initialise the connection to the database.
- Parameters
db_url (string) – The URL to the database, e.g.
postgresql+psycopg2://user:password@hostname/database_name
-
__module__
= 'woohoo_pdns.pdns'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
_query_for_ip
(q)[source]¶ Query the “rdata” for an IP address.
- Parameters
q (str) – the IP address (as a string) to search for.
- Returns
A list of
Record
objects for records found (can be empty)
-
_query_for_name
(q, rdata)[source]¶ Query the “rrname” or the “rdata” in the DB.
Note
This is for string queries only (no IP address queries).
- Parameters
q (str) – The search term, can contain “*” as a wildcard.
rdata (bool) – If True and the query is a text query, search the right hand side instead of the left hand
side. –
- Returns
A list of
Record
objects for records found (can be empty)
-
add_record
(rrtype, rrname, rdata, first_seen=None, last_seen=None, num_hits=1)[source]¶ Add a (new) record to the database.
If a record with that rrtype, rrname, rdata already exists in the database, the hitcount is increased by num_hits, first_seen or last_seen are updated if necessary and the existing object is returned. Otherwise a new object will be created and returned (with hitcount 1, fist_seen = last_seen = sighted_at (or “now” if sighted_at is not provided)).
- Parameters
rrtype (int) – the id for the DNS record type (e.g. 1 for A, 28 for AAAA, etc. See https://en.wikipedia.org/wiki/List_of_DNS_record_types)
rrname (string) – the “left hand side” of the record; a trailing dot will be removed
rdata (string) – the “right hand side” of the record; a trailing dot will be removed
first_seen (datetime) – the date and time of the first (oldest) sighting; if omitted and also no last_seen is provided “now” will be used
last_seen (datetime) – the date and time of the most recent sighting; if omitted and also no first_seen is provided “now” will be used
num_hits (int) – the number of times this record was seen (will be added to an existing records hitcount)
- Returns
A
Record
object representing this record.
-
close
()[source]¶ Close the connection to the database. It is important to call this method after you are done. Will be called automagically when used with the context manager.
-
property
count
¶ The total number of pDNS records in the database.
-
find_record
(rrtype, rrname, rdata=None)[source]¶ Search for a record (by type and left hand side, optionally also right hand side).
- Parameters
rrtype (int) – The id for the DNS record type (e.g. 1 for A, 28 for AAAA, etc. See https://en.wikipedia.org/wiki/List_of_DNS_record_types).
rrname (string) – The “left hand side” of the record.
rdata (string) – The “right hand side” of the record.
- Returns
The
Record
object representing the record.- Raises
NoResultFound –
-
load
(source_name, batch_size=10000, cfg=None, data_timezone='UTC', strict=False, loader='woohoo_pdns.load.SilkFileImporter')[source]¶ Load data into the database.
The actual work is done by the class referenced in the “loader” argument.
- Parameters
source_name (str) – The directory or filename or other reference to the source (e.g. a Kafka topic name) where data should be loaded from.
batch_size (int) – For more efficient loading into the database, records are inserted/updated in batches; this defines the maximum number of records to process at once.
cfg (dict) – A dictionary with config items that will be passed to the constructor of the
Importer
.strict (bool) – If true, abort loading if “errors” are detected in the input. If false, try to “fix” the error(s) and/or to continue loading remaining data. Default is False.
data_timezone (timezone string) – If source data without a timezone specification is found, assume the timezone is this.
loader (
Importer
) – Defines what class is used for the actual loading of data.
-
property
most_recent
¶ The most recent record in the database, i.e. the one with the most recent “last_seen” datetime.
-
query
(q, rdata=False)[source]¶ Issue a query against the database.
When
- Parameters
q (str) – the query, can be an IP address (v4 or v6) or text.
rdata (bool) – text queries look for matches on the “left hand side” (rrname) unless this option is set which makes the query search for matches on the “right hand side”. Use it for example to search for domains that share a common name server (NS record). For IP address queries, this is ignored; defaults to False.
- Returns
A list of records that matches the query term.
- Throws:
MissingEntry
if no match is found for the query.
-
property
records
¶ A list of all pDNS records in the database.
-
-
exception
woohoo_pdns.pdns.
InvalidEntry
[source]¶ Bases:
ValueError
When SQLAlchemy fails to commit a record to the database, this exception is raised.
The details produced by SQLAlchemy will be included in the exceptions description.
-
__module__
= 'woohoo_pdns.pdns'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
exception
woohoo_pdns.pdns.
MissingEntry
[source]¶ Bases:
ValueError
When a query does not yield any result, this exception is raised.
-
__module__
= 'woohoo_pdns.pdns'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
woohoo_pdns.pdns.
Record
(**kwargs)[source]¶ Bases:
sqlalchemy.ext.declarative.api.Base
Database representation of a record in the pDNS system.
A record can be of any DNS type (A, AAAA, TXT, PTR, …) and has a “left side” (
rrname
) and a “right side” (rdata
). More information about “left hand side” and “right hand side” is available on the Farsight website for example.-
first_seen
¶ The date and time (incl. timezone) when a record was first seen by this pDNS system.
- Type
DateTime
-
last_seen
¶ The date and time (incl. timezone) when a record was last seen by this pDNS system (i.e. the most recent “sighting”).
- Type
DateTime
-
rrtype
¶ The type of the record (A, AAAA, TXT, …) according to the official list of DNS types.
- Type
int
-
hitcount
¶ The number of times this record was “sighted” by this pDNS system.
- Type
int
-
__init__
(**kwargs)¶ The init method is just setting up a logger for the class.
The kwargs just make it a “cooperative class” according to super() considered super.
-
__mapper__
= <Mapper at 0x7f13d26b2dd8; Record>¶
-
__module__
= 'woohoo_pdns.pdns'¶
-
__table__
= Table('record', MetaData(bind=None), Column('first_seen', DateTime(timezone=True), table=<record>, nullable=False), Column('last_seen', DateTime(timezone=True), table=<record>, nullable=False), Column('rrtype', Integer(), table=<record>, primary_key=True, nullable=False), Column('_rrname', String(length=270), table=<record>, primary_key=True, nullable=False), Column('hitcount', Integer(), table=<record>, nullable=False, default=ColumnDefault(1)), Column('_rdata', String(length=300), table=<record>, primary_key=True, nullable=False), schema=None)¶
-
__tablename__
= 'record'¶
-
_rdata
¶
-
_rrname
¶
-
_sa_class_manager
= {'_rdata': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, '_rrname': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'first_seen': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'hitcount': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'last_seen': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'rrtype': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶
-
ensure_aware_dt
()[source]¶ When reconstructing a
Record
from the database, ensure that the datetimes (first_seen and last_seen) are “aware” objects (i.e. have a timezone).This is mainly an issue when using sqlite (e.g. for testing) as sqlite does not store timezone information. In case the timezone information is missing, UTC is assumed and added.
-
first_seen
-
hitcount
-
last_seen
-
property
rdata
¶ The “rdata”, i.e. the “right hand side” of the record (cf. class attribute documentation).
-
property
rrname
¶ The “rrname”, i.e. the “left hand side” of the record (cf. class attribute documentation).
Note
When setting this property, the value will be sanitized by
woohoo_pdns.util.sanitise_input()
; this means that a trailing dot will be removed unless the value is just a dot.
-
rrtype
-
to_dict
()[source]¶ Convert the record object to a dictionary representation that is suitable for SQLAlchemy bulk operations.
-
to_jsonable
()[source]¶ Convert the record object to a JSON-friendly dictionary representation.
Note
This dict is compatible with the Passive DNS - Common Output Format.
-
woohoo_pdns.util module¶
-
class
woohoo_pdns.util.
LoaderCache
[source]¶ Bases:
object
This class implements the cache used when loading entries into the database.
Because pDNS databases have to ingest high volumes of data with high redundancy (never seen before entries are comparatively rare) it can be expected that caching substantially improves performance.
The cache internally holds values in dictionaries with a key derived from the actual data. To add records to the cache the named tuple ‘record_data’ should be used.
Note
When adding a record to the cache, four modes are available: cache_only, new, updated and auto. For a description of the modes, see the documentation of the “modes” named tuple.
-
MODES
= cache_modes(cache_only=1, new=2, updated=4, auto=8)¶
-
__contains__
(item)[source]¶ Checks if the cache contains the entry represented by the named tuple (or dict) passed in.
- Parameters
item (record_data) – The record which should be checked for presence in the cache.
- Returns
True if the item is in the cache, false otherwise.
-
__dict__
= mappingproxy({'__module__': 'woohoo_pdns.util', '__doc__': '\n This class implements the cache used when loading entries into the database.\n\n Because pDNS databases have to ingest high volumes of data with high redundancy (never seen before entries are\n comparatively rare) it can be expected that caching substantially improves performance.\n\n The cache internally holds values in dictionaries with a key derived from the actual data. To add records to the\n cache the named tuple \'record_data\' should be used.\n\n Note:\n When adding a record to the cache, four modes are available: cache_only, new, updated and auto. For a\n description of the modes, see the documentation of the "modes" named tuple.\n ', 'modes': <class 'woohoo_pdns.util.cache_modes'>, 'MODES': cache_modes(cache_only=1, new=2, updated=4, auto=8), '__init__': <function LoaderCache.__init__>, '__contains__': <function LoaderCache.__contains__>, 'get_new_entries': <function LoaderCache.get_new_entries>, 'get_to_update': <function LoaderCache.get_to_update>, 'add': <function LoaderCache.add>, 'rollover': <function LoaderCache.rollover>, 'clear': <function LoaderCache.clear>, '_dictionise': <staticmethod object>, '_tupelise': <staticmethod object>, '_add_to_new': <function LoaderCache._add_to_new>, '_add_to_update': <function LoaderCache._add_to_update>, '_add_to_cache_only': <function LoaderCache._add_to_cache_only>, 'merge': <staticmethod object>, '__dict__': <attribute '__dict__' of 'LoaderCache' objects>, '__weakref__': <attribute '__weakref__' of 'LoaderCache' objects>})¶
-
__module__
= 'woohoo_pdns.util'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
static
_dictionise
(item)[source]¶ Convert ‘item’ (named tuple class:record_data) into a dictionary { key: item }
- Parameters
(class (item) – record_data): the item to ‘convert’ into a dictionary
- Returns
dict with one key (item.key, if it was set) and item as its value
-
static
_tupelise
(item_key, item_value)[source]¶ Convert ‘item_key, item_value’ (value of type dictionary) into a named tuple class:record_data
- Parameters
item_key (str) – the key that is used for the ‘value dict’
item_value (dict) – a dictionary that holds the relevant data to create a class:record_data
- Returns
record_data): the item that results from ‘converting’ the dictionary
- Return type
item (class
-
add
(item, mode=8)[source]¶ Add a new item to the cache.
- Parameters
item (record_data) – The representation of the item to add to the cache.
mode (mode) – What mode to use (see documentation of mode for details) if the record is not yet in the cache.
- Returns
Nothing.
-
get_new_entries
(for_bulk=False)[source]¶ Return the list of items that are considered not to be present in the pDNS database yet.
Note
The main reason for differentiating between new and updated entries (with respect to the pDNS database, not the cache) is to allow bulk operations in SQLAlchemy; it must be known if ‘INSERT’ or ‘UPDATE’ statements should be used.
- Parameters
for_bulk (bool) –
If true, a list of dictionaries will be returned (suitable for SQLAlchemy bulk operations), if false, a list of record_data named tuples will be returned. For more information about SQLAlchemy bulk operations, see the SQLAlchemy documentation on bulk operations.
- Returns
A list of either named tuples or dictionaries (see ‘for_bulk’ argument).
-
get_to_update
(for_bulk=False)[source]¶ The same as
get_new_entries()
but for entries considered to already be present in the pDNS database (not necessarily the cache).- Parameters
for_bulk (bool) – see argument with the same name documented for
get_new_entries()
- Returns
A list of either named tuples or dictionaries (see ‘for_bulk’ argument).
-
static
merge
(existing, new)[source]¶ Merge two cached items by updating the new item’s hitcount (add the existing item’s count to it) and set the new item’s first_seen and last_seen to the minimum (maximum) of the new and the existing item’s values.
- Parameters
existing (dict) – An item present in the cache
new (dict) – An item that should be updated with the info already present in the cache.
- Returns
Nothing, the new item will be updated in place.
-
modes
¶ The mode is relevant when adding records to the cache that are not already present in the cache. ‘cache_only’ should only be used to (pre) populate the cache. This is mainly useful if the ‘auto’ mode should be used later on. ‘auto’ assumes that the cache already holds all relevant entries; therefore, when adding an entry it will be cached as ‘new’ if it was not present in the cache before and as ‘updated’ if it already was in the cache. If the mode is set to ‘new’, the entry will be considered to be new (i.e. returned by
get_new_entries()
) whereas with the mode set to ‘updated’ it will be considered to already be known by the pDNS database (but not necessarily the cache, i.e. it will be returned byget_to_update()
).alias of
cache_modes
-
rollover
()[source]¶ Should be called after the pDNS database is updated with the currently cached entries (i.e. after the bulk operations are done for the lists returned by
get_new_entries()
andget_to_update()
).This will ‘move’ all cached entries into the ‘cache_only’ status, indicating that they are ‘known’ but not ‘dirty’ (in a cache’s way of using that word).
- Returns
Nothing.
-
-
woohoo_pdns.util.
record_data
¶ A named tuple holding the values of a single entry in the pDNS database.
Note
first_seen and last_seen can be both, datetimes or timestamps (integers) but it must be consistent. The ‘key’ field can be left empty; it will then be auto-populated by the cache in a consistent way. If it is non-empty the passed in key is kept and it is the caller’s responsibility to guarantee uniqueness of the key(s).
alias of
woohoo_pdns.util.pdns_entry_tokens
-
woohoo_pdns.util.
record_to_nt
(rec)[source]¶ Takes a dictionary style cache entry and returns a corresponding named tuple.
Note
The “key” is not set because the other functions in this module do not require it (DRY).
-
woohoo_pdns.util.
sanitise_input
(str_in)[source]¶ DNS entries technically end in a dot but for pDNS purposes the dot is mainly cruft, so we remove it.
Note
If the input string to this function is just a dot, it is kept. While a single dot might be ‘surprising’ it is still better than an empty string.