Sunday, February 21, 2016

Don't use the greater than sign in programming


One simple thing that comes up time and time again is the use of the greater than sign as part of a conditional while programming. Removing it cleans up code, here's why:

Conditionals can be confusing

Let's say that I want to check that something is between 5 and 10. 
There are many ways I can do this

All of these mean the same thing... Wait, did I actually do all that right? Sorry, one of those is incorrect. Go ahead and find out which one, I'll wait...

If you remove the use of the greater than sign then only 2 options remain:
(x < 10 && 5 < x)
which is a stupid option because it implies 10 < 5 and
(5 <&& x < 10)

This is a nice way of expressing "x is between 5 and 10" because it is literally between 5 and 10.

It's also a nice way of expressing that "x is outside the limits of 5 and 10"
(x < 5 || 10 < x)

Again, this expresses it nicely because x is literally outside of 5 to 10.

Simple. Clear. Consistent.

This is such a nice way to express numbers I wonder why programming languages allow for the greater than sign ( > ) at all.

But why is this so expressive?

The Number line

Here's how you represent between 5 and 10 on a number line vs code:
number line of between 5 to 10
(5 <&& x < 10)


Here's how you represent outside of  5 and 10 on a number line vs code:
number line of outside 5 to 10
(x < 5 || 10 < x)


On a number line everything to the left is less than the numbers to the right, so these two ways of representing the relationship between things matches up.

Combinatorics

This problem gets much worse as the conditional grows. For example
 number line of between -5 and -1 or between 2 and 4
((-5 <&& x < -1) || (2 <&& x < 4))

Has 15 other possible ways to be expressed if you  include the greater than sign and don't make your expressions conform to the number line.





27 comments:

  1. That's a smart trick. I know from my personal coding that any time I start writing the bracket tokens, I'd better look extra closely and probably test what I'm doing because I tend to write a lot of bugs in that kind of code.

    ReplyDelete
  2. Agreed all over.

    FWIW, in Python, you can condense a little bit more by chaining inequalities: "5 < x < 10" means the logical 'and' of "5 < x" and "x < 10". And in common with your article, this also becomes impenetrable with > signs.

    ReplyDelete
  3. You might want to fix the "outside" case to use "<=", or change the number line pictures, which seem to imply 5 and 10 are "outside".

    ReplyDelete
  4. How simple it may seem it is indeed an issue to do it right. Working with integers is still the most easy one although you don't explicitly memtion if you want to include or exclude the boundary values. It becomes even more difficult if you're working with doubles or datetime objects. Then suddenly broken numbers or the time of the day also play part of the equation.
    I've seen this go 'horrably' wrong several time, esoecially with datetime objects. Where analists write "from x until y" it is not clear wheather to include x and y or only include x and exclude y. To be clear there are enough ways to solve this but clear communication is the best one...

    ReplyDelete
  5. How about writing a simple function like between(x1, x2) and replace all of that notation

    ReplyDelete
    Replies
    1. Why write a function for such a trivial expression? You will add to the cost of the overall method in which the method between appears. Remember, this will be called for each iteration.

      Delete
    2. I think its perfectly fine to write small, "trivial" functions if they improve the expressiveness and readability of your code. Making micro-optimizations such as inlining expressions should probably be left to the compiler, which does a pretty good job at those kinds of optimizations. This allows the programmer to focus on optimizing the parts of the code that really need it, after they've profiled and identified a hot spot.

      Delete
  6. > all of that notation

    ahahahahaha!!!

    ReplyDelete
  7. > How about writing a simple function like between(x1, x2) and replace all of that notation

    You will need 3 parameters in many languages, so between (x1,x2,x3) . so then the question becomes is the middle number the between value? or the first/last?

    I do not care about the method overhead. Many languages will have it inlined anyways by runtime and it's a trival amount of time. Much prefer to optimize for readability and change rather than processor time.

    > You might want to fix the "outside" case to use "<="
    Yes, you are right.

    ReplyDelete
    Replies
    1. Using the middle for the `x` has a nice 'symmetry' with the meaning. If I were to do this, I'd call it something other than 'between' to make this more clear; perhaps `is_ordered(l, x, u)`.

      Delete
  8. I'm the biggest fan in the world of creating functions to improve the expressiveness and readability of code, but I don't think that in this case it does so. The original expression is so clear and explicit that replacing it with a function call actually obscures important details of the operation such as the boundary conditions (i.e. is it '<' or '<=' for the start and end boundary?) Nothing that can't be figured out, with a quick 'go to definition', but in requiring the reader to do that, you've actually made the code harder to read, not easier.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. What is cleaner or easier to read comes down to personal taste. But how to express "all numbers greater than 1" without '>'?

    ReplyDelete
  11. Personal taste is certainly a factor in some things, but not in everything. It's worth having the discussion. And within a single team, it sometimes helps to have a chat like this so that everyone can converge on a single style. If the decision is controversial, then perhaps it isn't appropriate. But a discussion like this one is often all that's needed for everyone on the team to realise they all feel the same way about it.

    ReplyDelete
  12. But how to express "all numbers greater than 1" without '>'?

    1 < allNumbers

    ReplyDelete
  13. Your "only two options remain" are the same except that you have swapped what is on either side of the &&. Maybe you meant

    10 < x && x < 5

    for one of them.

    ReplyDelete
  14. You also can have function/method x.inBetween(low, hi) of between(lo, x, hi) and wrap it in not(). Its far more readable than x < 5 || 10 < x, which is poor-written !(5 <= x && x <= 10).

    ReplyDelete
  15. TLDR: use <= operator.

    The operator < is *exclusive*, which generally is not what is meant when you say "pick a number between 5 and 10". The number lines also show it as inclusive. In a discrete setting, the expression (2 < x && x < 4) is only true for x=3. More correct would be to use the <= operator. One can use less intuitive inclusive boundaries 1 x || x > 10).

    ReplyDelete
  16. nic post...
    http://mkniit.blogspot.in

    ReplyDelete
  17. This comment has been removed by the author.

    ReplyDelete
  18. For those who want ESLint to disallow the greater-than operator, I created a pull-request (that has been rejected but) that contains the solution in the comments: https://github.com/eslint/eslint/pull/8677

    The solution is to add the following rule in your config:

    "no-restricted-syntax": [
    "error",
    {
    "selector": "BinaryExpression[operator='>']",
    "message": "Expected < instead of >."
    },
    {
    "selector": "BinaryExpression[operator='>=']",
    "message": "Expected <= instead of >=."
    }
    ]

    Thanks Llewellyn Falco, keep up with the great articles!

    Chris

    ReplyDelete
  19. Your title is against the greater than symbol, but your example where “one of them is wrong” shows the wrong one being done with the less than symbol “(10 < x && x < 5)”. After reading your post I'm not sure you've even made a point against the greater than symbol.

    It's a valid symbol and it's practical for math so why not have it?

    ReplyDelete
  20. I think everyone has different tricks to read maths quicker. For me it's cleaner when I think about X, one condition at a time, so (x > 5 && x < 10) would be better. Because it's how I would express it in speech: "x is greater than 5 and less than 10", not "5 is less than x", I'm concerned about x, not 5.

    ReplyDelete
  21. Your number line examples include both 5 and 10 as possible answers (making the range equal to or greater than 5 to equal to 10), while your code excludes them - so which is it?

    ReplyDelete
  22. | (x < 10 && 5 < x)
    | which is a stupid option because it implies 10 < 5 an

    No, that expression does not imply 10 < 5

    ReplyDelete
  23. As a matter of style, I prefer to always have the variable name to the left of the operator. I think it is cleaner even though you can have more variants of the conditional.

    ReplyDelete

Note: Only a member of this blog may post a comment.