watchdog timer embedded systems Archives

Reliability is not an option in embedded systems. People may expect a microcontroller to run for days, months, or even years without any help. During that time, it must keep working even if software bugs happen that weren’t planned. A program can get stuck in an infinite loop, stop working while it waits for a peripheral, corrupt its own state, or just stop responding because of a memory or timing error.

If this happens on a desktop, the user can restart the software on their own. That kind of manual recovery may not be possible in an embedded system, especially one that is used remotely or in a product.

This is when the watchdog timer is very important. The watchdog is a hardware safety feature on STM32 microcontrollers that automatically restarts the system if the firmware can’t show that it’s still working properly.

A properly set up watchdog can make a weak application much stronger by letting the system fix itself when something goes wrong.

What a Watchdog Timer Actually Does

A watchdog timer is a piece of hardware that counts down on its own, separate from the main logic of your application. Once it is turned on, it starts counting down from a set value until it reaches zero. If the firmware is working properly, it refreshes or feeds the watchdog every so often before the counter runs out.

The watchdog will reset the system if the firmware gets stuck, takes too long, or can’t refresh the counter in time. After the reset, the microcontroller starts over from the reset vector and picks up where it left off.

The main idea is simple but strong. The watchdog doesn’t try to figure out why something went wrong. It doesn’t care if the problem was caused by a bug in the software, a timing problem, or an unexpected interaction between pieces of hardware. It only checks one thing: is the software still getting to the point where it refreshes the watchdog often enough to show that the execution is going well?

The watchdog is built into the hardware, so it can keep working even when the main CPU logic stops working right. That it doesn’t depend on any one piece of hardware is what makes it useful as a last line of defence.

Why Watchdogs Matter in STM32 Projects

STM32 microcontrollers are used in many real-world systems, such as IoT devices, industrial controllers, battery-powered sensors, consumer goods, and electronics that help cars. Software bugs can happen in any of these places. A peripheral might not respond. A communication routine could get stuck forever. A stack overflow could mess up the flow of execution. A storm of interrupts could starve the main loop.

The system could stay frozen forever if there is no watchdog. The failure is only temporary when the watchdog is turned on. The system restarts, reinitialises, and has a chance to keep working well.

This is why the watchdog is so important in systems that aren’t being watched. Picture a smart farming device in a field, a remote environmental monitor on a pole, or a factory controller inside equipment that is sealed. If a bug in the software makes the device hang forever, the product might not work right or even be dangerous.

The watchdog makes that risk much lower by making sure that lockups don’t last forever. It doesn’t take the place of good software design, testing, or error handling, but it does help when something goes wrong.

The Two Main Watchdogs in STM32

STM32 devices typically provide two hardware watchdog mechanisms: the Independent Watchdog, usually called the IWDG, and the Window Watchdog, usually called the WWDG. These are related but serve slightly different purposes.

The Independent Watchdog is the simpler and more commonly used option. It runs from its own low-speed internal clock source and is designed to remain active even if the main system clock fails. This independence makes it especially reliable. Once started, it is generally intended to keep running until a reset occurs. That makes it well suited for safety-critical and general robustness use cases where the main goal is to guarantee that the system cannot remain stuck forever.

The Window Watchdog is more specialized. In addition to requiring the software to refresh it before timeout, it also requires the refresh to occur within a defined timing window. If the software refreshes too late, the watchdog resets the system, but if it refreshes too early, that can also be treated as an error. This is useful when you want to detect not only stalled execution but also abnormal timing behavior, such as a loop running much faster than expected because program flow has gone wrong. The Window Watchdog is therefore often chosen when timing correctness matters as much as simple liveness.

Understanding the Independent Watchdog

The Independent Watchdog is usually the first watchdog STM32 developers learn to use, and for good reason. It is straightforward, reliable, and suitable for many applications. It is clocked from the low-speed internal oscillator rather than the main CPU clock tree, so it continues running even if the main clock configuration becomes unstable.

That means it can still reset the system in situations where software is no longer able to manage the regular timing infrastructure.

The IWDG works by using a prescaler and a reload value. The prescaler slows the watchdog clock down, and the reload value determines how long the counter takes to expire. By choosing these values appropriately, you can define a timeout interval that matches your application’s behavior.

If your main loop normally runs every few milliseconds, you might choose a watchdog timeout of hundreds of milliseconds or a few seconds, depending on how much timing margin you need. The timeout should be long enough to tolerate normal delays, but short enough that genuine faults are recovered promptly.

Once the IWDG is enabled, the firmware must regularly write a specific value to the refresh register. This reloads the counter and prevents reset. If the refresh does not occur in time, the watchdog reaches zero and resets the microcontroller.

Understanding the Window Watchdog

The Window Watchdog adds an extra layer of timing control. Instead of allowing refresh at any time before expiration, it creates an allowed refresh interval. If the application refreshes the watchdog too early, it can indicate that control flow is executing incorrectly or that code is racing through the system faster than intended. If it refreshes too late, the watchdog times out and resets the system.

This makes the WWDG useful in applications where normal execution timing is predictable and where abnormal early execution is considered a fault.

For example, if a control loop is expected to execute every fixed period, the Window Watchdog can help detect if the loop starts spinning too quickly due to a missed wait condition or logic error. In practice, the WWDG is somewhat more complex to tune correctly because the refresh window must align with real execution timing and interrupt activity.

For beginners, the IWDG is usually the better starting point, but learning the WWDG is valuable for more advanced applications.

Choosing the Right Timeout

A watchdog timeout is not something you should choose casually. If the timeout is too short, the system may reset during perfectly normal operation. If it is too long, a real software failure may leave the system unresponsive for longer than acceptable. The correct timeout depends on the worst-case timing of the code path responsible for refreshing the watchdog.

This is one reason why watchdog design is really a system design question, not just a register configuration exercise. You need to understand how long the main loop can legitimately take, how interrupts might delay execution, whether communication or flash operations can block for significant periods, and how much recovery latency the application can tolerate. In general, the watchdog should not be refreshed from trivial fast code unless that code genuinely proves the whole system is healthy.

A bad design is one where some small timer interrupt always refreshes the watchdog even when the main application is frozen. In that case, the watchdog will never expire, and its purpose is defeated. The refresh should occur only after enough of the system has executed successfully to indicate that normal operation is still happening.

Basic Example: Independent Watchdog in Bare-Metal STM32

The following example shows the general idea of enabling and refreshing the Independent Watchdog in a simple STM32 bare-metal style program. Register names can vary slightly between STM32 families, but the overall structure is representative.

#include "stm32f4xx.h"

static void iwdg_init(void) {
    /* Enable write access to IWDG registers */
    IWDG->KR = 0x5555;

    /* Set prescaler */
    IWDG->PR = 0x06;  

    /* Set reload value */
    IWDG->RLR = 2000;

    /* Reload the counter */
    IWDG->KR = 0xAAAA;

    /* Start the watchdog */
    IWDG->KR = 0xCCCC;
}

static void iwdg_refresh(void) {
    IWDG->KR = 0xAAAA;
}

static void delay(volatile uint32_t count) {
    while (count--) {
        __NOP();
    }
}

int main(void) {
    iwdg_init();

    while (1) {
        /* Application code */
        delay(500000);

        /* Refresh watchdog only if main loop is healthy */
        iwdg_refresh();
    }
}

In this example, the watchdog is configured, started, and then refreshed inside the main loop. If the program becomes trapped somewhere before iwdg_refresh() is reached, the watchdog eventually expires and resets the MCU. That is the fundamental use case. Even though the code is simple, it illustrates the most important principle: the refresh happens only as part of normal application progress.

Example with a Simulated Fault

A watchdog becomes easier to understand when you deliberately create a failure and observe the reset. In the next example, the firmware runs normally for a short time, then enters an infinite loop without refreshing the watchdog. This simulates a software hang.

#include "stm32f4xx.h"

static void iwdg_init(void) {
    IWDG->KR = 0x5555;
    IWDG->PR = 0x06;
    IWDG->RLR = 1500;
    IWDG->KR = 0xAAAA;
    IWDG->KR = 0xCCCC;
}

static void iwdg_refresh(void) {
    IWDG->KR = 0xAAAA;
}

static void delay(volatile uint32_t count) {
    while (count--) {
        __NOP();
    }
}

int main(void) {
    uint32_t loops = 0;

    iwdg_init();

    while (1) {
        delay(400000);
        loops++;

        if (loops < 5) {
            iwdg_refresh();
        } else {
            /* Simulated fault: system hangs here and stops refreshing */
            while (1) {
            }
        }
    }
}

Here the system refreshes the watchdog during the first few loop iterations, then deliberately stops. Once the timeout period passes, the watchdog forces a reset. After reset, the program starts from the beginning again. This kind of test is useful because it proves that the watchdog is actually configured correctly and that the reset path behaves as expected.

Detecting Whether a Watchdog Reset Occurred

In many applications, it is not enough to simply let the watchdog reset the device. You also want to know why the last reset happened. STM32 microcontrollers provide reset flags in the reset and clock control logic so that firmware can determine whether the previous reset was caused by the watchdog, power-on, software reset, or another source.

A common technique is to read the reset flags early during startup and store the result somewhere useful, such as a global variable, a diagnostic counter, or non-volatile memory. That way the application can log watchdog events, report them through telemetry, or change behavior after repeated failures.

A simple example looks like this:

#include "stm32f4xx.h"

static uint8_t watchdog_reset_detected = 0;

static void check_reset_source(void) {
    if (RCC->CSR & RCC_CSR_IWDGRSTF) {
        watchdog_reset_detected = 1;
    }

    /* Clear reset flags for next boot */
    RCC->CSR |= RCC_CSR_RMVF;
}

int main(void) {
    check_reset_source();

    if (watchdog_reset_detected) {
        /* Diagnostic action: log event, blink LED, increment counter, etc. */
    }

    while (1) {
    }
}

This makes the watchdog far more useful in practice. It is no longer just a hidden recovery mechanism. It becomes part of the product’s diagnostic system.

Example Using HAL for the Independent Watchdog

Many STM32 developers use the STM32 HAL rather than direct register programming. The following example shows the general pattern for configuring and refreshing the Independent Watchdog with HAL. Exact initialization values can be tuned for your device and application.

#include "stm32f4xx_hal.h"

IWDG_HandleTypeDef hiwdg;

static void SystemClock_Config(void);
static void MX_GPIO_Init(void);
static void MX_IWDG_Init(void);

static void MX_IWDG_Init(void) {
    hiwdg.Instance = IWDG;
    hiwdg.Init.Prescaler = IWDG_PRESCALER_64;
    hiwdg.Init.Reload = 1250;

    if (HAL_IWDG_Init(&hiwdg) != HAL_OK) {
        while (1) {
        }
    }
}

int main(void) {
    HAL_Init();
    SystemClock_Config();
    MX_GPIO_Init();
    MX_IWDG_Init();

    while (1) {
        /* Main application task */

        HAL_Delay(100);

        if (HAL_IWDG_Refresh(&hiwdg) != HAL_OK) {
            while (1) {
            }
        }
    }
}

This example is easier for many beginners because the HAL abstracts the lower-level register details. The underlying concept is the same as in the bare-metal version. The watchdog is initialized once and refreshed during healthy operation. If the application stops reaching the refresh call, reset occurs automatically.

Example Using the Window Watchdog

The Window Watchdog is more timing-sensitive, but a simple example helps show how it works conceptually. In this case the watchdog must be refreshed after the counter drops into an allowed window, but before it reaches the lower reset threshold.

#include "stm32f4xx_hal.h"

WWDG_HandleTypeDef hwwdg;

static void MX_WWDG_Init(void) {
    hwwdg.Instance = WWDG;
    hwwdg.Init.Prescaler = WWDG_PRESCALER_8;
    hwwdg.Init.Window = 80;
    hwwdg.Init.Counter = 127;
    hwwdg.Init.EWIMode = WWDG_EWI_DISABLE;

    if (HAL_WWDG_Init(&hwwdg) != HAL_OK) {
        while (1) {
        }
    }
}

int main(void) {
    HAL_Init();
    MX_WWDG_Init();

    while (1) {
        HAL_Delay(20);

        /* Refresh within allowed timing window */
        if (HAL_WWDG_Refresh(&hwwdg) != HAL_OK) {
            while (1) {
            }
        }
    }
}

This example is intentionally simple, but it introduces the timing-window idea. If the delay is too short, refresh may occur too early. If it is too long, refresh may occur too late. In a real application, you would tune the prescaler, window, and refresh cadence based on measured task timing. That extra complexity is why the Window Watchdog is often introduced after developers are already comfortable with the Independent Watchdog.

Watchdog Refresh Strategy in Real Systems

One of the most important design questions is where the watchdog should be refreshed. A naive design refreshes it in the fastest repeating code path available, but that often misses the point. The best refresh point is one that proves the overall system is genuinely healthy.

In a simple superloop design, that may mean refreshing the watchdog only after all critical tasks in the loop have completed successfully. In an RTOS-based system, it may mean collecting health signals from several tasks and refreshing only if each has reported progress within the expected period.

For example, suppose your system has communication handling, sensor reading, and control output updates. If the watchdog is refreshed after only the communication task runs, the system might still appear alive even though sensor acquisition has frozen.

A better approach is to refresh only after all three critical functions have completed successfully in the current cycle. That way the watchdog becomes a meaningful indicator of total system health rather than partial activity.

Common Mistakes When Using a Watchdog

A common mistake is enabling the watchdog without carefully thinking through worst-case timing. Another is refreshing it from an interrupt that always runs, which can mask major failures in the main application. Yet another is choosing a timeout based on average execution time rather than worst-case behavior. Systems may work fine during light testing and then reset unexpectedly in the field when timing stretches under real conditions.

Some developers also forget that certain operations can take longer than expected, such as flash erase cycles, communication retries, or startup calibration routines.

If the watchdog is active during those operations, the code must either refresh it appropriately or choose a timeout that safely includes those delays. Another frequent issue is failing to log watchdog resets, which makes post-failure diagnosis much harder.

The reset happens, the system recovers, but nobody knows why. In a product environment, that lost diagnostic information can be very costly.

Watchdog Use with an RTOS

In an RTOS-based STM32 application, the watchdog should not usually be tied to just one task unless that task truly represents the health of the whole system.

A more robust pattern is to let each important task update a heartbeat flag or timestamp. A supervisor task then checks whether all required heartbeats have been updated within the allowed interval. Only if every critical task has proven progress does the supervisor refresh the watchdog.

This design is important because RTOS systems can fail in selective ways. One task may continue running while another is deadlocked. If the running task refreshes the watchdog directly, the failure is hidden. A centralized health-check strategy avoids this problem by requiring proof that the broader system remains operational.

watchdog timer embedded systems

Watchdog Timer Tutorial for STM32 with Examples