∞ Too long to handle
The conventional wisdom that longer symbols (variables, functions, classes…) aid comprehension is quite often wrong. It can be a “can’t see forest for the trees” kind of a thing. It’s a trade-off like most things in engineering. We’ll show a real-world example.
Can’t see expression for the text
Following is taken from very serious production code, obfuscated, of course. Don’t worry, it’s not a medical or life support system, nor controlling any hazardous equipment.
Target TargetByServiceType(std::string serviceTypeName) {
if (advanced_management) {
if (serviceTypeName == EXTERNAL_MANAGER_TYPE) {
return Target::InternalOnly;
}
}
bool isSystemServiceType = std::find_if(
std::begin(systemServices),
std::end(systemServices),
[](auto& typeId) {
return typeId.serviceTypeName == serviceTypeName ||
serviceTypeName == service.qualifiedServiceTypeName;}
) != std::end(systemServices);
return (isSystemServiceType ||
serviceTypeName == EXTERNAL_MANAGER_TYPE) ?
Target::Internal :
Target::External;
}
So, this system has some internal manager of services which can be set aside with an external manager in an “advanced management” mode. In a “who watches the Watchmen” moment, you realize that the internal manager has to manage the external one. This particular system chooses to manage all the system services with the internal manager, to prevent system becoming unusable if the external manager fails.
While this is C++, should be rather easy to read even if you’re no
fluent in it. The systemServiceTypes is an array which has a list
of, well, types of system services. There are (at least) two issues
with this code. Look at the code for a little trying to find them.
OK, let’s take care of the easy ones:
- What’s the difference between
InternalandInternalOnly? One really can’t figure it out from this code or some general knowledge. In this particular system,Internalis just a bad name. It should have beenBoth. But, it goes to show that longer names do not really mean better names. - The code (C++ lambda) that checks for a particular service has two
checks, but in one it uses the
serviceTypeNameon the left of comparison and in the other on the right, confusing the reader that maybe the order matters (in C++ it might). This is rather hard to spot in such “long code” and has that much harder whiplash effect, but how quickly can you spot it inname == id.name || id.qualifiedName == name? - There’s no reason to introduce the helper Boolean, one can save the
iterator
std::find_ifreturns and do the check later. It’s pretty clear what we’re searching for here, thus the long name of said Boolean doesn’t help and the long line with lambda and the check for success is making it harder to read. - From the name of the function (OK, so it’s not always bad to have long names) we see this doing something by service type. So, why do we “carry that on” in the parameter name? There’s no benefit, unless the code would also deal with some other names.
- Similar, but not that bad, the
typeIdagain adds no value, because we’re obviously going through (system) service types.idis quite enough. EXTERNAL_MANAGER_TYPEis not actually a type, it’s a name. The longer the symbol name, the easier it is to be imprecise like this, yet the whole point of having a long symbol is to be precise.
Well, that was easy. Let’s apply what we learnt (supposing we can also apply it to the rest of the code):
Target TargetByServiceType(std::string name) {
if (advanced_management) {
if (name == EXTERNAL_MANAGER_TYPE_NAME) {
return Target::Internal;
}
}
bool it = std::find_if(std::begin(systemServices),
std::end(systemServices),
[](auto const& id) {
return id.name == name || id.qualified == name;});
return (it != std::end(systemServiceTypes) ||
name == EXTERNAL_MANAGER_TYPE_NAME) ?
Target::Both :
Target::External;
}
OK, now I hope you can spot at least one problem. Look at it carefully. We did make some formatting improvements, but most of those are feasible because of shorter names.
Why do we check for the EXTERNAL_MANAGER_TYPE_NAME again at the
end? Obviously, the external manager is only relevant if
advanced management is enabled! That was not so easy
to spot with all that code before.
In fact, we can take it a little further. If advanced management is not enabled, then there’s no reason to even search! All your service management needs are fulfilled by internal manager simply because there is no other service manager in town!
Now, in this particular system, it’s more fine-grained than that and there’s an additional flag, not looked at here, which actually makes the search when advanced management not enabled somewhat useful (it can be improved, but way to system specific for what we’re talking about here). But still, we spotted it because we made the code shorter.
One more thing on the “brevity” front is C++ specific, so we didn’t show it before. But in a C++ context, it is relevent:
- No need for
systemServiceTypesto be an array, it can be a vector (at least since some C++ standard). It avoids thestd::prefix. - Either with the
std::ranges, or with a helper, no need to pass two iterators tofind_if. - If
Targetis a namespace or “strong enum”usingit is fine.
So, let’s apply these:
Target TargetByServiceType(std::string name) {
using enum Target;
if (advanced_management) {
if (name == EXTERNAL_MANAGER_TYPE_NAME) {
return Internal;
}
}
auto it = myfind(systemServices, [](auto const& id) {
return id.name == name || id.qualified == name;});
return it != systemServices.end() ? Both : External;
}
OK, there’s one last, but the most important problem, a true bug.
So, when searching, we check for both qualified and “non-qualified” name. Sure, it’s not great that this code can receive both, but let’s put that aside for now.
Why are we not checking for both in the if (advanced_management) body?
In this system, like in most, one cannot figure out the qualified name
from the unqualifed. Obviously EXTERNAL_MANAGER_TYPE_NAME is
unqualified. That doesn’t mean we can be left satisfied with that.
There’s of course, a helper that can get you the unqualified part
from the qualified. Since the point of this is not to fix said bug,
we won’t show the fix.
In any case, this was rather easy to spot now, wasn’t it?
The case for Math notation
There’s a reason why math is written with x instead of
theUnknownVariable or such. It’s easier to reason about what
that math is doing.
Case in point:
Target TargetByServiceType(std::string n) {
bool const e = n == EXTERNAL_MANAGER_TYPE;
if (advanced_management) {
if (e) {
return target::InternalOnly;
}
}
bool b = std::find_if(
std::begin(systemServiceTypes),
std::end(systemServiceTypes),
[](auto& id) { return id.serviceTypeName == n ||
n == id.qualifiedServiceTypeName;
}) != std::end(systemServices);
return (b || e) ? target::Internal : target::External;
}
Even without the better global long names and C++ specifics,
this is way easier to reason about and spot the issues. I mean,
surely you saw that we use e twice, once inside the if and
once outside?
To Math or not to Math, that is the question
If your function (procedure, method…) is very long, math notation doesn’t apply. You did hear that functions et al should not be long, though, right?
But if it is short, even if it is not doing “math stuff”:
function check(x, y) {
return x == y;
}
A very good software engineer and friend once bugged me to change the code above to:
function check(firstArgument, secondArgument) {
return firstArgument == secondArgument;
}
Now, this is so short that it doesn’t matter much, but still, it
really is needlessly long. One can hardly mix x and y in a few
characters like this, so the long names just mean you will spend more
time reading.
But suppose you want to make the condition more complex during maintenance:
function check(firstArgument, secondArgument) {
return (firstArgument == secondArgument) || (firstArgument < SOME_THRESHOLD);
}
Compare to:
function check(x, y) {
return (x == y) || (x < SOME_THRESHOLD);
}
Now, tell me that the long-named one is easier to read and reason about with a straight face. Like, really, really, try to. It’s a good excercise, even if you will fail, lest you’re a world-class actor.
Also, if code is not really doing much, if it’s some glue code
transforming some values or interfaces into other, math notation
doesn’t help. But naming, say strings qtString vs winrtHString
when interfacing some Qt code with Windows Runtime, that actually
helps.
Still, you don’t have to go “all the way down” to Math. Just the shortening as described above is good enough in most cases.
‘When analyzing code, longer names help me identify stuff’ fallacy
I’ve heard this argument, mostly illustrated by complex code which has many symbols (variable, functions) but it goes against what we know about how the human brain works.
If you’re analyzing some code, you need to “keep it all in”, otherwise you’re using “crutches” and that’s just no way to go.
Human brain has short term memory capacity of 7 +/- 2. So, if you have more than nine symbols to analyze, you’re done for, you need crutches, I don’t care how long the names of said symbols are. That’s why in math we use substitutions, analyze the simplified problem then apply that back. In software engineering, you use helper functions/procesures/methods/whatnot.
An interesting bit of trivia: chimpanzee’s brain has significantly “wider” short term memory capacity. Yup, a chimp would be a better software engineer than us.
As for pattern recognition, brain recognizes long patterns by checking the beginning and the end and “fuzzily” checking the in-between. So, two long symbols that start and end the same and are similar inside will be recognized as the same. If the symbols are short, there’s less chance of them being different yet similar. So, it’s actually easier for the brain two mix up two different long symbols than short ones.
At long last, it takes the brain longer to recognize longer patterns,
though it’s not linear. I’m not aware of any widely accepted theory
about how that works, but a lot of stuff in mammals is log/exponential
based, so let’s guess that the effort to recognize a pattern (symbol
name length) is roughly log(length). If that’s correct, it matters
not if a symbol is 1 or 4 characters long, or if its 16 or 30. But, it
does matter if it’s 1 or 15.
‘Math is outdated’ fallacy
When discussing this with people, I heard the argument “Math is short because people didn’t have the resources in that age”. Oh, but how wrong that is!
When math notation was invented, people doing math were either:
- extremely rich, or
- working for extremely rich masters
in either case, they had an abundance of resources, way more than any software engineer has these days. They had servants, enough paper (or papirus) and ink to write a thousand books each year.
So, there was no incentive to “cut it short”. And if you think about
it, there were quite a few incentives the other way around, say,
procuring young attractive servants from your master “to help you do
all the work because writing down the proofs in theUnknownVariable
notation takes so long”.
Remember COBOL
That’s like, the ultimate “no, long names are not always good” argument.
If long names are always good, how come we all think COBOL is (mostly) bad?
The opposing symbol length laws
- The longer the lifetime of a symbol is, the longer its name should be, as it more precisely identifies it in a crowdier namespace, making it easier to use the correct symbol
- The longer the name of a symbol is, the easier it is to give it a less precise name than it should have, making easier to use the wrong symbol
Nobody said it was going to be easy.
Moral of the story
There really isn’t such a thing as something being always better.
Long names are not always better. Short names are not always better. Use short(er) names for short(er) lifetimes. But not always. Use your engineering judgement, always. It’s kind of the point of being an engineer.
