The pitfall is that customers exist who infer from these statements that they can purchase the licenses for any required service agents, give the CD to an administrator, and ultimately let the system administrator configure the throttling agents into the service inventory. All problems solved.
Let’s start by defining what throttling actually is from several points of view and then we can go from there to illustrate that these conclusions can be way off and can be very costly to resolve.
In the eyes of customers, throttling is a way to manage the amount of traffic to a service or a back end system. Expectation is often that they manage the amount of throughput by restricting the access to a service to not exceed a certain metric, ie. a specific number of calls per second or per minute, depending on load.
What does a statement like this mean, to the messages that arrive at the throttling agent and are marked to be beyond the predefined threshold?
For a customer this could mean that the message has to wait until the measured load does not exceed the threshold anymore.
To a throttling agent this usually means that the service call cannot be executed, because executing this would violate the expected threshold even more.
What should happen to a service call exceeding the threshold? This actually depends on the purpose of a service and the context in which this service call is executed. A few examples are:
- A read call for customer data intended for display in a client application
- A read operation in the context of an update service composition
- An update of an address based upon a client application request
Let’s highlight some characteristics for each of these:
It seems OK to respond “too busy” to the service consumer. The consumer can retry in less busy times if it’s really important. To the throttling agent it means it’s ok to discard the message and respond “too busy”.
This scenario is distinctly different from the first one. Discarding the read message to be executed in the context of an update, will most likely trigger a retry mechanism. The retry mechanism will make the same message come back potentially even quicker than a retry attempt coordinated by a service consumer. This will significantly increase the resource load on the message infrastructure as well as on the throttling mechanism. A way to overcome this issue is to have message properties which help identify in which context the message is being executed. And on the same throttled service, but in the different context, you can decide to allow this service call although messages in the ‘regular read’ scenario would be refused. For this you can even use two different throttling statistics.
This does however expose an issue related to throttling in service composition context. For elaborating this see the part of the article titled to “Where to throttle in the SOA”.
A little less easy to deal with. Discarding the message does most likely not meet the business requirements. If the front end application (service consumer) is a web application which exposes a page to submit address changes, it might even cost customers ifthe data was typed and the system would say “too busy”. But then what? It seems that storing the message and trying later in less busy times should be fine.
One has to ask himself now whether the order in which messages are processed is significant in the context of the core service logic. This determines whether it’s fine to park requests which exceed the threshold for later execution. Even if the requests are parked for later use, this is just moving the problem to another place in the system or to another point in time. If the amount of throttled (parked) messages is big, when processing the parked messages you may face another throttling challenge. If we’re just moving the problem it does not seem like a viable solution.
If the order of execution is to be guaranteed, the solution mentioned cannot be executed as no new messages can be executed until the one exceeding the throttling policy can be successfully processed. This is another solution which does not seem viable to solve the throttling issue.
What other options do we have? If we look at this scenario, the throttling should do no more than manage that messages do not get lost while the availability of the service provider is not guaranteed (the throttling policy exceeded situation is the same as having an availability issue of the service provider). This can be solved by utilizing a queue to mitigate for the times of reduced availability of the service provider. This should be sufficiently supporting that (eventually) the address update is being executed.
A way of throttling in this situation is to have messages read from a queue (the messages posted by a service consumer) at a maximum predefined rate to prevent the throttling policy from being violated. This can only be done if the consumer does not require a synchronous response to its update request message. This is a perfectly fine solution where no messages get lost, the order of execution is maintained only if messages are read from the queue one at a time which then seriously impacts the scalability of the throttled service.
The processing order of messages is relevant if out of order execution causes data integrity issues or service failures. Some examples are: two subsequent address changes on the same customer result in the incorrect address details in the customer database if executed out of order. Or similarly, if the core service logic of a service must create subsequent service calls, and the second one cannot be completed successfully without the first one being executed successfully, this will create similar data inconsistency issues.
A way to make the system more scalable is when messages can be skipped if a more recent one was received; this applies in certain situations only. When taking the two subsequent address changes as an example: only the most recent message for the address change is relevant to the system. This can be achieved by assigning a time based or sequence based message property or header element to a message upon receipt by the service consumer. Perhaps the service consume can already assign this to the request. If an expression can be used to identify whether a processed message can be dropped because a more recent one has been processed before. (A similar system can be used to detect replay attacks). For this to work, a form of data store must be available in the system to keep track of these requests.
Where to throttle in the SOA?
Where should throttling happen anyway? There is no single right answer to this topic. Let’s consider the following layered service inventory:
a - Public services controlling the access to services inside the service inventory, acting as an endpoint for all external access to the service inventory
b - Orchestration services (orchestrated task services) controlling all centralized and long-running processes
c - Business services (ie. task services, utility services) which are the sole access point to any underlying layer
d - Data Services (ie. entity services, utility services) controlling all back end access.
What happens if we throttle on services in each of these layers?
This can control the amount of traffic allowed into the service inventory but what does that achieve? It would only achieve throttling on that level in the infrastructure. It would be good for controlling specific service consumer policies and indirectly keep the load on the underlying system manageable but in the end, many public services can access the same business services or back end services in complex compositions resulting in a significantly greater amount of requests to the back end which can be a multitude of the amount of client requests. Furthermore, not all business service would need to be exposed on public level, meaning that load can exist on the lower layers inside the service inventory that the throttling mechanism would not be measuring.
A system can hardly be throttled on this level, as process starters are often inside the orchestration engine and cannot be exposed to service agents. This means that if a process must be controlled, it’s probably best to throttle externally to the process, ie. in the layers ‘above’ or ‘below’ the orchestration layer. Or by controlling the flow of messages to and from the process in general.
Business services might seem to be the best place to control throttling because they see all the traffic coming into the system has to pass by a business services. They see the traffic from the top layers and the traffic from the layers below. But multiple business services may be composed together in complex compositions, resulting in a throttling nightmare when the need to throttle arises on these kinds of services. For example, if I had an order service which is used in a composition which should also invoice the customer and schedule an electronic payment, on which of these elements (functional business service areas) do I throttle? On one of the composed services? On the service composition controller? On all? Even here a single answer is not possible because it all depends on the purpose of the throttling and potentially this differs per throttled service.
This can control the back end load probably best, but as the data access services usually do not know the context in which they are being called, applying the throttling on this level has its restrictions. Referring back to 2) if the read of underlying data is relevant for an update operation on the same service composition, then what kind of an effect does refusing the read operation have on the service composition, and what is the consequence of that to the bus infrastructure? This is not easy to answer as it is probably different for many services.
In the end I think it all comes down to why the throttling happens from business or technology point of view. Sometimes the throttling happens to protect the legacy and back end systems, which should allow for throttling on data service level. Sometime the throttling happens to protect the middleware from excessive load, which can probably best be manages at business service level. Sometimes it’s one specific consumer which can threaten the entire system’s availability and then a throttling policy for selected relevant service capabilities exposed to that particular consumer may be used to protect other consumers from the load caused in the middleware by the throttled one.
Each method has it’s pros and cons. When looking at the overall picture, it may be perfectly fine to throttle on two or three levels. Although a combined throttling policy may not be the most easy to comprehend and it may not be using the system resources to the best extent, it still remains a popular method as it guards a number of key parameters of the system. This results in a solution which is still manageable without the need for capacity enhancements.
Of course throttling policies can be used in other ways, for example to give priority to certain messages, or messages from certain consumers or customer requests on the system, and many other ways exist, but this post is just an example and I can never convey all the issues and opportunities of throttling in one single post.
Dealing with throttled services in a reference architecture.
I hope that this post conveys that throttling is not trivial. In fact it’s crucial to have an up-front analysis done for your throttling architecture before any policy is applied. This can be formalized by documenting specific approaches for specific situations in a reference architecture document.
A well-respected colleague of mine once said: the throttled message can be discarded and the service consumer can be thrown a technical exception. Although this may be fine for many messages and throttling implementations, please be aware that more options exist and can be addressed by having a throttling (reference) architecture.
A throttled service may throw a technical exception. Usually technical exceptions are treated by consumers as a permanent failure for a service call. If the call was a read operation, probably it may not happen again, but if it’s a write operation, the consumer may have retry mechanisms in place which might immediately result in another call with the same message, This however is the easiest and most straight-forward implementation and can be introduced without really big implementation issues in the system. Most initial implementations may be using this method. Some caution with this statement: if the consumer treats this as an invalid message call to another system, some elaborate log analysis sessions may come from this, since people cannot tell the difference between a back end availability issue and a throttled message response. To make this difference, you may not be able to avoid customization of the services.
A throttled service may throw a technical or functional status (“not now”) but in the end this means that the service consumers must be able to understand this message. What it means is that at present the message cannot be completed. Probably it does not make sense to retry the message at this point in time but a retry at a later time may work perfectly well. This means that a delayed retry may succeed after all, whereas an immediate retry would not.
Once a reference architecture exists, it should be easier for system administrators to think about and implement new policies, and fine-tune throttled entities. But be aware that, depending on how elaborate the throttling architecture is, the complexity of throttling may dramatically increase. Even if current throttling parameters are understood perfectly well, a dependency analysis must be conducted to be able to fully assess and understand the implementation of a new throttling policy.
A similar risk may apply for making changes to an existing throttling policy. As soon as you tune the throttling policy to switch close to a point near “typical load”, or the typical load increases to a value near the configured throttling policy, dramatic changes in system performance and behavior can be expected.
My advice would be to have and changes or new policies investigated by a team which consists of administrators (system current knowledge), architects (system dependencies and consequences) and capacity planners (system future load).