When I learned about Finite State Machines as part of my college degree, I glanced past the formal definition and thought, this is a very simple set of rules. There is probably some more advanced version out there that is employed in the industries. At that time, I had a very rosy picture of businesses and thought that software companies make use of the latest and greatest tech whenever possible.

This notion quickly crumbled when I saw the massive codebases of projects. While there were spots of code that were brilliantly written, most of it was simple and easy to understand given time. As I learned more, I began to appreciate the developers who wrote code in an easy-to-understand manner, even if that led to a non-optimal performance. Of course, non-optimal just means 90% as fast as the best solution.

One of the approaches I found to simplify code was using state machines. This is such an astonishingly simple technique to use, yet not taught as part of the curriculum that I think it is a travesty. Regardless, I will try to illustrate the benefits of using an FSM with a concrete example.

Problem

Suppose you are working on an IOT device. This kind of device has a spotty connection to the internet and transfers small amounts of data periodically to the server. It is also online for an extended duration of time with no restarts. This scenario makes it vital that the application that runs on it will always be able to recover from any faults without user intervention.

On a side note, Web Servers are also expected to run long but we usually trade the effort required to engineer such solutions with the expectation that someone will be on call to restart the server when needed.

Usual Approach

This kind of problem is usually solved by using a boolean flag. There is nothing wrong with this approach. In fact, for a system with 2 states a flag variable with getter and setter is the best solution.

connected = True

def set_connection_state(state):
   connected = state

def get_connection_state():
   return connected

This looks wonderful and beautiful.

This also stops looking wonderful and beautiful when you start having to deal with the real world and its uncertainties.

As a real example of possible sub-states:

  • What if the connection is not atomic, but happens as 2 operations(this legit happens in embedded systems)
  • What if there are multiple application-level interactions required to make sure that the connection is valid and data can pass through
  • What if in some conditions, application data should not pass at all? Something else is using the channel.

Right away you see there seems to be a need for at least 3 flags

tcp_connected = False
server_connected = False

mqtt_subscriptions_successful = False

over_the_air_update_in_progress = False

Now, you might say, just use them flags with 8 getter/setter. Not a big deal.

You have already fallen into the trap.

Now, there are 16 possible states of the system caused by these 20 lines of code. While most of the incorrect states are impossible by design, that's only talking about the current code.

Any time someone changes code that uses these flags, there is a possibility that invalid states will creep in and send the application into undefined behavior territory.

if ota_available():
  over_the_air_update_in_progress = True
  wipe_firmware()
  download_firmware_start() # Async process

# New changes
if over_the_air_update_in_progress and new_protocol_active:
  over_the_air_update_in_progress = False
  over_the_air_update_in_progress_v2 = True

# Old code that handles disconnection
if tcp_connected == False and over_the_air_update_in_progress = False:
  restart()

This is a contrived and crude example, but cases like this can easily be missed in peer review. Here, the device will delete the Operating System and start updating it. But a sudden loss of internet caused it to restart. The corrupted system will be bricked and will need to be sent back to the manufacturer so that they can reinstall the OS.

I have used Python to illustrate that this is NOT a problem isolated to low-level languages. Any application that involves states has this potential. This is literally every single line of code ever written.

What State Machines brings to the table

In a single word, simplicity.

The possible states with the allowed transitions are as follows:

stateDiagram-v2 s1: OFFLINE s2: TCP_CONNECTED s3: SERVER_CONNECTED s4: MQTT_SUBSCRIBED s5: OTA_IN_PROGRESS s1 --> s2 s2 --> s3 s3 --> s4 s4 --> s5 s5 --> s3 s2 --> s1 s3 --> s1 s4 --> s1

Now, if we make sure that only allowed transitions are permitted, and check the state before doing actions, then we can mathematically guarantee that the system will never be in an unexpected state and if it ever tries to do an invalid transition you will be notified.

You might think this is hard to implement. It's not. You can write something for your use-case with an enum(or a set of constants) and a switch case(of an if ladder). Thankfully, all major languages have libraries that have already done the leg work. You just need to look at your diagram and copy the states and transitions.

For Python, you could use python-statemachine. For the above machine, you could use the following code:

from statemachine import StateMachine, State

class ConnectionState(StateMachine):
    offline = State(initial = True)
    tcp_connected = State()
    server_connected = State()
    mqtt_subscribed = State()
    ota_in_progress = State()

    tcp_conn_estb = offline.to(tcp_connected)
    server_conn_estb = tcp_connected.to(server_connected)
    mqtt_subs_success = server_connected.to(mqtt_subscribed)
    ota_start = mqtt_subscribed.to(ota_in_progress)
    ota_end = ota_in_progress.to(mqtt_subscribed)

    lost_connectivity = (
        tcp_connected.to(offline),
        server_connected.to(offline),
        mqtt_subscribed.to(offline),
        ota_in_progress.to(offline),
    )

This particular package also has the ability to draw out the possible states, like so:

from statemachine.contrib.diagram import DotGraphMachine

graph = DotGraphMachine(ConnectionState)  # also accepts instances
dot = graph()
dot.write_png('dot.png')  

While oriented differently, it's not hard to see that the States and the transitions are the same.

When to use State Machines

I hope that I have convinced you that FSMs are useful beyond theoretical research. However, like any tool, state machines should not be overused. That will just make it confusing. As a general rule, use state machines if you foresee that the problem you are solving can have more than 3 states. Till then just use a binary/ternary flag.

Side Notes

Related Posts

Leave a Reply

%d bloggers like this: