đŽ Exception ergonomics
Exceptions should be application specific, rather than generic and code/library specific. They should be caught at the place where there is enough information to handle them properly and where the handling takes the least amount of effort (code).
Recap: what do we mean by exceptions
We mean the exceptions as defined by languages like C++, C#, Java, Python, Ruby and others. To some extent it also pertains to CommonLispâs condition system and similar constructs in other languages. To a limited extent, we also mean Adaâs exceptions.
We do not mean processor exceptions and how to handle them (you canât define your own processor exceptions, at least not in mainstream CPUs).
So, we assume that you know the basics of exceptions in these languages - what happens when an exception is thrown or caught, how to catch them, can there be more than one âin flightâ, renaming, checked exceptions, exception safety and so on.
What we mean by exception ergonomics
Itâs âhow to use exceptions to make your (coding) life easier and avoid hurting yourselfâ.
There are many horror stories about how using exceptions can be terrible. We will only incidentally reference them here. Itâs enough to just be aware that these exist and that one should be very careful about using exceptions, or face dire consequences.
We will discuss:
- When to throw exceptions
- What exceptions to throw (their names and data)
- When/where to catch them
We will not discuss things such as âexception safetyâ, even though they somewhat overlap with ergonomics. But, their focus is on code safety, and whatever the code ergonomics, safety should be preserved always.
We will also not discuss exception mechanics, which can also differ significantly between languages. Things like âshould we catch by value, reference, pointer or maybe rvalue referenceâ in C++ and such, while also somewhat related to ergonomics, have more implications to exception safety and correctness, and should be done in a certain way, ergonomics notwithstanding.
How about not using exceptions at all
Thatâs certainly an option. But, remember that:
- Not all languages/toolchains allow âno exceptionsâ mode
- If they donât, then some library you are using may throw exceptions, whether you like it or not.
- If they do, then you may not be able to use some libaries, because those are not designed to work in an âno exceptionsâ mode.
- Some languages, like Ada and Java, will actually throw exceptions âon their ownâ, not only because some code raises them.
- Horror stories aside, sometimes exceptions are a good fit for the problem at hand. Well written code with exceptions will make the âhappy pathâ easy to read without error handling distractions and significantly reduce the amount of error handling code. Itâs not easy to get to this âexceptions paradiseâ - it might be practically impossible, but, when itâs possible, itâs very nice.
So, if it is possible in your code to not use exceptions and your code will be better off not using them, by all means, donât use them and forget all about their ergonomics.
Illustrating example
To illustrate the ideas we will use a real-world example. It was a moderately complex device which was connected to a PC, which handled the GUI. PC and the device communicated through a simple, device- specific, Remote Procedure Call protocol.
The problem with RPCs is that communication can break, and thus any
RPC can fail for reasons that have nothing to do with the RPC itself
(say, RPC GetTime()
, obviously has nothing to do with PC <-> device
communication). So, you can âenrichâ the RPC with a separate
out-parameter (communication_success
), add a separate result (if
your language supports it) or play bit-tricks with the result code,
forcing all RPCs to have the same result type (some integer), like
HRESULT
of some Windows APIs.
If you do not want to any of that, but want a âcleanâ RPC interface,
and you do not want something even nastier, like some errno
or
GetLastError()
interface which implies keeping a static (or, at
best, thread-local) variable with the result of last RPC call, your
only choice, in many languages, is to use exceptions. Thatâs what we
did in this case and, in general it worked out well. There were
several issues which will cover here, including what to do to avoid
them.
The PC code was written in C++, so most examples will be in C++, but the ideas are not C++ specific.
Bad exception ergonomics advices
Interestingly enough, even though exceptions in the form we discuss here have been around for ~30 years, there are not that many guidelines/ advices about their ergonomics. But, pretty much all that one can find are not very good.
âUse exceptions for exceptional situationsâ
The most often heard one is the worst:
Use exceptions for exceptional situations
This is way too generic and ambiguous. What one may consider exceptional, another one may not. Even worse, even if we agree on the âexceptionalityâ of some situations, the same code might encounter, in different contexts, different situations. Consider âopening of a fileâ. Does the calling code expect the file to exist or not? If it does, than failing to open should be an exception; otherwise it should not. But, thatâs only known in the calling code.
To mitigate this, some âopen fileâ procedures (functions, methods, subroutinesâŚ) might offer modes, configurations and such (in CommonLisp, one might use the flexible and powerful conditions system). But, those are just clutches for a fundamentally bad interface.
âThrow exceptions if procedure cannot meet post-conditionsâ
A slightly better one is
Throw exceptions if your procedure cannot achieve itâs post-conditions
While well-intended, itâs not very applicable in practice.
Most procedure authors donât spend much time in contemplating the post-conditions of the procedure. One might argue that this is bad and they should change their coding habits. Nevertheless, this is the situation in the real world.
Also, since such advice usually goes hand-in-hand with the advice that âpreconditions should not be exceptions, but assertsâ, itâs hard to figure out why two closely related concepts should be treated so differently. There is some validity in doing that, but, itâs a cognitive burden.
Even disregarding the âlazy programmerâ, this is tricky. Post-conditions can change during the evolution of the code and maintenance of a procedure. Having to update our âthrow if postconditions failâ code can be a maintenance burden. The thing is, most of the time, to achieve this, we would need to add some âpostcodition checkingâ code towards the end of our procedure. This can also be a performance issue, if the procedure is on a âhot pathâ. Then, one might be tempted to remove it in some âoptimized buildâ, which is a bigger problem, as now these different builds behave differently, exception-wise (which was one reason to avoid âpreconditions as exceptionsâ).
At long last, what is a postcondition is often rather arbitrary.
Remembering our âopen fileâ, is âfile successfully openedâ a
post-condition? If not, what is? One might come up with âthe file
object (structure, record, pointerâŚ) returned is validâ, but that is
ambiguous (is nullptr
a valid pointer or nil
or None
a valid
object) and, at long last, any procedure worth a damn should never
return invalid objects. Btw, it is one of the reasons why the
postconditions often change during evolution and maintenance.
âDo as the standard library of you language doesâ
Well, itâs certainly possible that some language has good exception ergonomics in its standard library. But, most mainstream languages do not.
Most languages insist on throwing library-specific exceptions like âNull pointerâ, âIndex out of rangeâ, âFile does not existâ, etc. As we described elsewhere, this is a bad idea.
This does not extend to the language (runtime) itself. Managed languages like Java or C# may throw from their managed environment (virtual machine), rather than standard library. For example, they may throw âNull pointer dereferenceâ. Ada language was designed with the idea that exception handling should be used for processor exceptions (traps), such as divide by zero (or âaccess non-existent memory addressâ - AKA page fault on processors with virtual memory support). Your code cannot raise such exceptions, even if it wanted to. Whatever exceptions as thus thrown âby the language itselfâ are as they are, you need to live with them. But they should not, in any way, influence when to throw exceptions in your code or what exceptions to throw.
When to throw
Obviously, you throw when there is some error. But, when is an error âexception-worthyâ versus being somehow indicated by some âspecial valueâ?
Essentially, you should throw when the code has no valid way of going forward. But, the problem is, that depends on the context. As discussed before, in one context, failure to open a file can be a show-stopper (if the file has essential configuration data that has no defaults), but in another context, it might be harmlessly ignored (if a file contains some cache).
So:
throw when you are certain that the code, under any possible context, cannot move forward
Obviously, opening a file should never throw, unless you make some âintorelable file openâ and use that procedure, instead of the regular âfile openâ procedure in your application.
As we can see, this means that when you throw is essentially application-specific.
What about out-of-memory
As many have observed, out-of-memory is a rather special kind of
exception. But, in reality, most of code will never be the one
that allocates memory, it will call some malloc()
-like procedure.
Said procedure may throw or not, but, that is not our code.
In our code, when we do write some âmemory managerâ, we should follow the same advice, even if we do know that this is a kind-of special situation. So, in most situations, donât throw, but, if you know that a particular memory manager really canât continue in any context (essentially, some applicaiton-specific memory manager), then you might throw.
Remember the third option - simply terminate
In some situations, simply terminating is also an option. For some helper, script (like) code, or some batch-processing, it might be perfectly fine to simply stop (abort) the program and display some error message. This can be done without involving exceptions and it avoids any âerror indicating valuesâ.
What to throw
Throw exceptions (codes, classes/objects) that are application specific. Do not throw generic or code/library specific exceptions.
This facilitates good exception handling.
What not to throw - Pythonâs âKeyErrorâ and similar
One of the best examples of âwhat not to throwâ exists in most languages and their standard libraries. Letâs take Python for example. The standard dictionary type will throw an exception if you try to index an element that is not in the dictionary. The idea was that you would get a nice error report if you donât catch it, but you can also catch it and handle it if you wish.
But, really, how helpful is this report:
Traceback (most recent call last):
File "main.py", line 10, in <module>
print(f(t))
File "main.py", line 7, in f
return a[y]
KeyError: 122
Sure, y
is 122
, but, how did it get said value? What is a
and
how come it doesnât have 122
? This is a simple example, things can
be much worse, instead of a
and y
, you might get some complex
expression. Not to mention that the call stack can be much, much
deeper, making it obfuscated. You need to look at the code and
probably debug it (run it again). A simple main.py: line 7: Assert
KeyError
, reported before simply terminating the application would
suffice to give you the place to look for trouble and debug (if you
wish, you can still get the call stack at that point, you donât need
exception handling mechanics for that).
Even worse, how should you handle such an error? This a
is not known
to your code and you donât know how itâs created. Sure, you can
inspect the code, but, it might change and nobody will inform you to
update your code. Essentially, such âcode specific exceptionsâ can only
be handled âat the place of the callâ, like:
try:
return a[y]
except KeyError:
return None
If the exception propagates (âescapes the calling functionâ) it becomes, as illustrated above, essentially un-handleable.
Now, if you know Python, youâre probably aware that dictionaries have a
method get
which does not throw, and the code can be rewritten as:
return a.get(y)
Which is kind-of the point. The get()
method is the one to use,
unless youâre writing some short or ad-hoc scripts. If you wish to handle
the missing key somehow, you still can:
x = a.get(y)
if x is None:
# handle it
else:
# use it
This is rather similar to a try/catch
block, the differences are
superficial.
Throw application specific exceptions
The idea is to throw an exception that the application can actually handle. Letâs use our PC <-> Device specific RPC to illustrate.
Library/code specific exceptions would be something like âfailed to
send dataâ or âresponse timeoutâ. These make as much sense as the
Pythonâs KeyError
described above. No code other than the one
directly calling a RPC would know what to do with such exceptions.
But, an exception like âRPC failedâ is application specific - at least as much as can be specific in the context of executing a RPC.
Grey area - what is âapplication specificâ in a library
Yes there is a grey area with-regards-to what is âapplication specificâ in a library. In our example, the RPC will probably be implemented in a library and the âfailed RPCâ can thus be thought of as a library exception. But, the thing is, it is geared towards the application and the way it uses the library. It is not geared towards how library works, which would be exemplified by exceptions such as âresponse timeoutâ.
Arguably, the best way would be to throw âRPC Such-and-such failedâ and have that be derived from âRPC failedâ and some âSuch-and-such failedâ, but that could be a kind of over-engineering. Still, if done well in a large-enough code base, could be very useful.
Special consideration - checked exceptions
Checked exceptions are mostly known from Java, where some exceptions are âcheckedâ, which means that the code has to handle them, they canât be left âunhandledâ. There are other languages that support this, some actually make all exceptions checked.
But, the question here is: if your languages supports checked exceptions, when should you actually use them?
Of course, we donât want to go down the road of bad guidelines like âuse them one some exception really needs to be handledâ.
Important thing to consider is that checked exceptions have the nasty effect of incuring a lot of âempty catchâes, just to âsilence the compilerâ. So, it is rather obvious that it should be used for a significant minority of exceptions. But, thatâs not saying much, as we might come up some âspecialâ application which actually mostly encounters this âminorityâ exceptions, and from its point of view, they are actually a minority.
So, given what we discussed above, it should be easy to conclude what to do here: raise a checked exception when your application cannot, under any circumstances, allow not handling said exception. For example, real time applications cannot allow unhnadled exceptions.
Now, if you have a library that is meant to be used in different applications, it obviously cannot throw checked exceptions and the application itself needs to somehow make sure exceptions from that library are handled.
When/where to catch
Catch when you have enough information to handle the error and where it makes the least amount of effort to do so.
The thing is, there might be many places where you could catch an exception, but, only one (or few) require the least amount of effort.
Keep in mind that we do mean actually handle the error. We donât mean âjust write empty catch blocksâ, as is often done, especially in Java, where one is forced to catch some âcheckedâ exceptions. A lot of time youâll see terrible code like this:
try {
FileReader file = new FileReader("C:\\test\\a.txt");
}
catch (IOException ex) {
}
Sometimes the author may try to hide this with some logging, but, thatâs just for show. This is bad, the whole point of exceptions is to not ignore them, yet that is precisely what the code is doing. Yes, Java standard library is terrible here. Not only is opening a file, which can be permitted to fail in many situations, treated as an exception always, the code is forced to handle this (rather than just letting the application crash).
Now, itâs hard to say any more on this, as this is now very application specific. But, since a lot of exceptions ergonmics is application-specific, then weâll tackle this together with all other aspects in a case study of sorts, that follows.
Case study
Weâll go back to our illustrating example and now treat is as a case study. What was actually done will be presented first, and then weâll show how it fits to our guidelines and, when it doesnât fit, what should have been done differently and how.
What was done
The RPC library (it was a rather simple module, but letâs call it library to highlight itâs usage) would raise a âRPC failedâ exception when any kind of communication error happened and the PC did not end up getting a valid response.
This exception was then caught in the GUI event loop, in a helper procedure that handled all GUI âcommandsâ (started via menu, keyboard shortcut, iconâŚ). The idea was that most commands would do some RPC (most of them only one or two) and that the procedure for each command would not handle (or even care) about exceptions, which would be caught in a central place. Something like:
try {
switch (command) {
case cmdGetTime: cmdGetTime_exec(); break;
case cmdTimeSet: cmdTimeSet_exec(); break;
...
}
}
catch (RPCException &ex) {
PostMsg("Communication with device failed: %s", ex.what());
}
The problem arose with some commands that actually executed a lot of RPCs. The device had a rather complex configuration and the requirement was that one can set the whole configuration with a single command (reading it from a file on the PC). Of course, this went hand-in-hand with another command that would read the current configuration from the device (and save it in a file on the PC).
This meant potentially hundreds of RPCs executed in one command. Reading is not so bad - if any fails, one could say that reading of whole configuration failed and report that to the user. Of course, that was deemed a too bad UX, as this could take a while (minutes), since the communication link was slow. So, some auto-retry logic was added, which meant that the âread configâ function would need to handle the exceptions itself. Still, not too bad:
retries = 0;
for (e = cfg_element.begin(); e != cfg_element.end(); ++e) {
try {
apply_config_element(*e);
}
catch (RPCException& ex) {
if (++retries == 3) {
break;
}
}
}
Of course, the try/catch we showed above (that âsurroundedâ the âswitch by command IDâ) now doesnât make sense for this particular command (this catch block made the one âabove itâ useless), but, there were some commands that did not do any RPCs, so, that was OK - it was already useless for some commands.
The big problem was setting the configuration. While adding a retry
was the first step, the problem was that even after a few retries, it
may fail, prompting an attempt to return to the previous
configuration. You guessed it, this rollback of sorts can also fail.
This code was terrible, riddled with try/catch
blocks until it was
deemed âgood enoughâ, but it was never really good enough. It was so
terrible that we wonât show it here.
How it fits to our guidelines
The âwhenâ to throw fits well. If a RPC cannot be completed, it makes no sense to move forward in any context.
The âwhatâ to throw fits âgood enoughâ. As discussed before, âRPC failedâ is application-specific enough. It would have been nicer to have derived classes for âreading time failedâ, âsetting time failedâ, etc, but, it is not essential.
For all the simple commands, which executed one or two RPCs, the âwhenâ to catch was good. It was when we had enough info (some GUI context) to display an appropriate message to the user, informing her that her command failed so she can decide what to do next. It was also the least amount of effort, as it was done in a few lines of code, rather than few in each of hundreds commands, amounting to more than thousand lines of code.
But, for the whole-configuration commands, this was obviously not good.
What should have been done differently
Letâs set aside reading for now. One could argue that itâs good enough, so let it be.
But, setting the configuration warrants some analysis. When thinking about such things, it usually helps to disregard the âexception mechanicsâ and see what we want to do here:
- Apply a configuration
- If that fails, retry a few times
- If it fails still, rollback
- If rollback fails, retry a few times
- If rollback fails still, give up and report an error.
Now, if we can see that ârollbackâ is merely applying the previous
configuration, there is step 0.
- read current configuration.
Thinking further, there could be other reasons that applying a
configuration can fail. For example, there might be something wrong
with our configuration data (file got corrupt). So, this is not only
about exceptions. This will likely be reported as an error in, say,
the result of an RPC (our SetTime()
might return -1
if given
invalid time).
Thus, it doesnât make sense to tie the failing of applying a configuration to exceptions and our command procedure should not handle exceptions at all and merely call a few helper functions, essentially coding the above several steps:
current_config = read_device_config();
if (apply_config(user_config) != 0) {
PostMsg("Failed to set the user configuration");
if (apply_config(current_config) != 0) {
PostMsg("Failed to roll back");
}
}
Obviously, the read_device_config()
and apply_config
would handle
the retries. They can be trivial:
for (retries = 0; retries < 5; ++retries) {
try {
read_device_config_raw(config)
break;
}
catch (RCPException &ex) {
continue; // looks like ignoring exceptions!
}
}
if (5 == retries) {
return -1;
}
Where read_device_config_raw()
can just go its merry way and blissfully
ignore exceptions.
The apply_config()
can be structurally the same, just calling a
different _raw()
and maybe having a different number of retries. Of
course, for optimization, one could retry only the last piece of
information, not restarting at the beggining. That would change the
structure of apply_config()
, but not by much.
We can see here that it is not always less amount of effort to handle the exceptions âup the call stackâ, sometimes itâs actually quite the opposite.
Itâs also interesting that an (essentially) empty catch block actually means handling the exception in this code structure. We could write this differently if that bothers us, but, this idiom is fairly well known in C/C++.
Moral of the story
Exception handling can be terrible but can also be quite effective. If you can avoid it completely, consider doing so. If you cannot, apply the guidelines presented above to increase the chances of achieving this efficiency. Also remember that exception handling is weird, most advice you heard about their ergonomics is not good and they kind of have rules of their own, like empty catch block can actually represent handling rather than ignoring and itâs not always better to handle exceptions up-the-stack.