VIA Circuit Breaker Plugin
==========================

The VIA API is generally reliable and stable - but as with any external service your application may sometimes experience
issues due to planned maintenance, connectivity problems (at Red61 or your own network) or server-side errors. It's 
good practice to engineer your application to tolerate failures of all third-party services.

The Circuit Breaker plugin can help you to protect your server and your users by blocking VIA connections for a short
period when it detects elevated error rates. It will limit the impact of:

* Your users becoming frustrated with repeatedly retrying and waiting to discover an API action is still failing.
* Your own servers becoming saturated (and potentially reaching memory, socket and similar limits) with a backlog of 
  pending network requests when the VIA server is unreachable.
* Continued heavy load on the VIA server complicating system recovery for example when services are restarting and 
  repopulating their caches.

> **Caution Advised**
> Just like the electrical equivalent, the circuit breaker needs to be specified and fitted carefully to avoid making
> things worse. In particular, setting the trip limits too low can result in "nuisance trips" - exceptions being thrown
> from API calls when VIA and your site are functioning exactly as intended. The circuit breaker is intended for use
> as part of an overall monitoring and resilience architecture. It may not be suitable for simple and low traffic sites.

## Functional Overview

### Tripping the circuit breaker (and choosing tripping limits)

Every time a VIA API call fails, the circuit breaker plugin checks the severity of the error and updates the count of
errors at that severity. The error rate for each severity level is tracked over a 60 second rolling window. If the 
error rate exceeds the configured limit then the circuit breaker will trip.

Currently there are two recognised severity levels:

* ViaApiCircuitBreaker::SEVERITY_CRITICAL - HTTP failures connecting to the VIA server, WSDL download or parse errors,
  and known internal server errors like java.lang.NullPointerException
* ViaApiCircuitBreaker::SEVERITY_WARNING - all other API call failures. 

SEVERITY_CRITICAL errors often involve waiting for service timeouts and may quickly result in backlogging of your own
worker processes as your site struggles to keep up with incoming requests. They are identified with relatively high 
confidence, so it is generally safe and recommended to set this limit quite low for fast failure - perhaps as low as 
one or two a minute.

SEVERITY_WARNING errors **will include expected exceptions** such as when resetting the password of an email that is not 
registered, or attempting to renew a membership scheme before it is due for renewal. These almost never indicate a 
problem with the system. However, a rate much higher than usual may indicate problems either with your calling code or
the VIA server (missing or corrupt data, for example) that is preventing proper operation. They may also include system 
errors that have not been positively identified as ERROR_CRITICAL.

We recommend you **do not set the breaker to trip on SEVERITY_WARNING** errors unless you have complete, reliable and 
long-term data on usual API call failure rates. If you are setting a limit for warnings, it should allow a good margin 
above your historical peaks, and allow for unusual load patterns (for example, if you commonly get spikes in new user 
registration/login following a social media campaign).

Failing to set a high enough SEVERITY_WARNING limit **will result in your site going offline when VIA is working 
perfectly**.


### What happens when the breaker is tripped

With the breaker tripped, VIA calls that can be served from your local cache may still succeed (provided you register
the circuit breaker after the cache plugin). All other calls will immediately throw a ViaCircuitBreakerTrippedException,
without making any attempt to create a SoapClient, download the WSDL or otherwise contact the VIA server. You can expect
the exception to consistently appear within a few milliseconds of the API call.

After a configurable `retry_interval`, the circuit breaker will allow a single request to VIA, to see if it has returned
to normal. If that request succeeds the breaker will be reset and API calls will continue as normal. If it fails, then
the breaker will stay tripped for another `retry_interval` before trying again.

If the retry request does not report back with success or failure within the `retry_lock_lifetime` - perhaps because of
a fatal error or other unexpected problem - then a further request will be sent. One request will be sent every
`retry_lock_lifetime` until one explicitly either resets the breaker or marks that it should still be tripped.

Once an instance of the breaker has blocked a request **it will block all other requests for the lifetime of that 
instance**. For example, in the common web-server context, all VIA API calls between the circuit breaker tripping and the
end of processing that HTTP request will be blocked. This is to prevent the unpredictable and inconsistent behaviour
that would appear if the breaker flickered on and off while rendering a single page or handling a single form submission.

### Resetting the circuit breaker

The `ViaCircuitBreaker` class exposes a `reset` method that you can use to manually reset the breaker (or to implement
your own reset logic). Normally, however, you would allow the breaker to reset automatically once it starts to see retry
requests returning successfully.

### State Storage

The default storage implementation uses PHP's built-in APC cache to track the state of the breaker and error rates 
between requests. If you're using apache and mod_php, this will share the circuit breaker between all requests handled
by a single server.

If you are running a cluster of servers, this will allow individual servers to trip if their own outbound network 
connection to VIA is misbehaving, without taking healthy servers offline.
 
If you are running nginx and fastcgi, bear in mind that APC state is not global to the machine in this environment, but
is separate to each group of worker processes.

In all cases other than apache/mod_php on a single server, you will need to set tripping error rates based on the number
of errors you'd expect to see in each process group, rather than the total across the whole cluster.

You can obviously implement your own `ViaCircuitBreakerStorage` class to suit whatever storage backend you want to
use.

### Notifications on trip and reset

You should always configure (and monitor!) some form of logging/alerting on trip and reset activity. The 
breaker provides the `ViaCircuitBreakerListener` interface to help with this. You should implement this interface
in a way that suits your overall monitoring/logging approach, and then register your listener with the
`ViaCircuitBreaker::setListener` method.

You will only receive notification of the first error to cause a trip, and the first time that trip is
then reset. The class does not notify the ongoing rate limit, or further triping errors received once the
breaker has tripped.

## Installation and configuration

The circuit breaker classes are autoloadable with the other VIA wrapper classes - see the main README for instructions
on registering the wrapper in your project.

The default storage implementation uses the APC cache to share data between processes, but it's easy to implement
support for alternate storage if required (see above).

You need to create an instance of the plugin with its dependencies, and register it with the ViaPluginManager attached
to your ViaApiClient object. For example, presuming you already have a reference to the plugin manager:

```php
$options = array(
	// Time in seconds after which one call will be allowed through to see if VIA is back to normal
    'retry_interval' => 60,
    
    // Timeout in seconds before another retry attempt will be allowed if the first retry attempt disappears
    // without reporting success or failure
    'retry_lock_lifetime' => 10,
    
    // Number of errors at given severity in the last 60 seconds to allow before breaker trips
    // SEE NOTES ABOVE on configuring these values
    'severity_rate_limits' => array(
        ViaCircuitBreaker::SEVERITY_CRITICAL => 0,
        // ViaCircuitBreaker::SEVERITY_WARNING  => 60, // DON'T limit warnings without reading above
    )
);
$breaker = new ViaCoreCircuitBreaker(new ViaApcCircuitBreakerStorage, $options);
$plugin  = new ViaCircuitBreakerPlugin($breaker);
// Presuming you have a reference to $plugin_manager already, from building the ViaApiClient
$plugin->registerWithManager($plugin_manager);
```

To use the plugin with the old-style `Red61_Via` class, you can extend the class and register the plugin on construction 
like this:

```php
class Red61_Via_WithCircuitBreaker extends Red61_Via
{
  function __construct($wsdl, $web_key, $basketId = NULL, $conf=array('cache'=>'true'), $arr=array()) 
  {
    parent::__construct($wsdl, $web_key, $basketId, $conf, $arr);
    $options  = array(); // valid options as example above
    $breaker  = new ViaCoreCircuitBreaker(new ViaApcCircuitBreakerStorage, $options);
    $plugin   = new ViaCircuitBreakerPlugin($breaker);
    $listener = new MyApplicationCircuitBreakerClass;
    $breaker->setListener($listener);
    
    $plugin->registerWithManager($this->plugin_manager);
  }
}
```

Make sure to **register the plugin after registering the cache plugin** to ensure that VIA requests that can be
served from cache are handled even when the circuit breaker is tripped.

## Checking circuit breaker state

You can check the `isTripped` method on the `ViaCircuitBreaker` class to find out if the breaker is already tripped.
This may be useful for pre-emptively showing the user a flash message or redirecting them when they visit a page that 
will need VIA data to be useful - for example a booking form page.

In general, we recommend you simply attempt calls and expect to handle occasional errors rather than just checking 
isTripped. Bear in mind that if you never send any calls once the breaker is tripped then it will never
get the chance to allow a retry - and the system will stay offline till you manually reset it.

## Credits and Licence

The VIA circuit breaker was originally developed in 2013 by inGenerator for the 
[Edinburgh International Book Festival](https://www.edbookfest.co.uk) and rebuilt as a VIA plugin against the
new wrapper in 2014, again by inGenerator for the [Edinburgh International Book Festival](https://www.edbookfest.co.uk).
 
It is released under the 3-clause BSD licence:

Copyright (c) 2014, Edinburgh International Book Festival
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the 
following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following
   disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the 
   following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote 
   products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, 
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
SERVICES; 

LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
