@tigill Tried to implement basic error handling.
The idea is the following: We use checkers that might throw errors, and might also get errors during notification after successful checking. We want to log these errors also via the notificators, so during error logging there might again occur errors. This is why it got a bit complicated.
First of all the basic error handling for the checkers was implemented, basically exactly as you proposed in the activity diagram. I left out the should_check part, because that is implicitly given by which checkers are in the list. The error_threshold is at 1 currently, so we can catch more errors and earlier during development phase.
Both at checking and notifying, three kinds of exceptions are catched: Either Checking/NotifyingFailedException, the requests ConnectionError, and the base Exception. The latter should in the future be handled separately, because it is supposed to catch all other exceptions, which are supposed to be only development bugs, and thus should rather be sent to the developers than the user.
I also implemented checking the telegram API response for success in dc5fa488.
Lastly, to handle errors during notifying, I implemented a NotificationHandler class, that has an internal message_queue of yet unsent messages, which are retried for delivery every minute (triggered by lea.py in the same main loop). Also, currently any message is seen as successfully delivered when at least one notificator could deliver it. However, for each failed notfication attempt, an error message is sent.
Fixes #2, and IMO #3: The error is catched, and request already retries to connect, so I see no reason to implement this on top of that.
Everything should work already and can therefore be merged, however there are some open TODOs:
For every failed notification attempt, a message is sent to the user. We could wait until a threshold or something to reduce spam
The message_queue is only runtime-alive and a persistance based approach should replace this
Improve exception-to-notifications relations: The (uninterested) user should not receive a traceback of a code bug, and a descriptive message if e.g. the login credentials were wrong (currently: CheckingFailedException) or the config is not properly set up.
What do you think? I'm open for suggestions.