SOA is not a disease!

SOA can deliver strategic value to companies willing to change (part of) their enterprise, not just their IT systems! Many companies however have tried and failed delivering SOA with the promised reduced IT burden and ROI. Some companies are even treating SOA like a disease. This is not fair. SOA is a way of thinking applied to supporting tools and not a tool on itself. I hope this collective set of topics will help companies decide whether this SOA thing is blessing or burden.

Friday, December 17, 2010

Core service logic and auxiliary service logic

Here's a nice scenario I have encountered recently:

Customer management wants every activity performed by a customer and every customer contact logged in the customer data store.

For this purpose, every relevant service and process in the service inventory is fitted with auxiliary logic to create customer logs in the CRM system.

Historically, this has worked for several years, but recently the system administrators started complaining about the sheer amount of data in the system and how it is becoming very hard to keep the system displaying these logs to the CM agent without running expensive (long-running) queries.

To cut a long story short, they want to reduce the amount of data logged for a customer by creating a new policy which only logs significant activities on a customer record.

For this to work, the initial request was to remove the logging logic from certain service capabilities.

In the service inventory, a whole bunch of services has, besides the relevant core service logic, some hard coded auxiliary CRM logging logic which will use a utility service to push the logged data to the CRM system.

As it is impacting many different services which are used in different contexts, removing the auxiliary logic as requested may not be the best way to go about this. Not only does this cost a lot of effort and regression testing, but also does it not solve the issue that the same service should log to the CRM system in one context, but not in another.

An alternative, perhaps more flexible way of changing the system's logging logic, is to change the utility service or apply a policy to this service which acts as a filter, deciding for all services whether or not to execute the actual CRM system update.

This means that the logging logic can still stay in the services as auxiliary logic, and the decision whether or not to perform the actual update will happen in the CRM system can be managed (declaratively), external to the core service logic of the services needing the logging, and external to the actual logging service itself.

This can work only if the context of the service call is also recorded with the logging message.

An alternative to this approach can be that all services log messages to the CRM system per default, but publish the message into a queue or topic and selectively process messages based on business rules in a business rules engine or similar.

Monday, December 13, 2010

SOA Pattern Publishing - why?!

Hi all,

Recently I have been posting a lot of candidate patterns on soapatterns.org, the pattern-related site of soasystems.com.

I was a matter of time before someone might ask "what are you doing and why are you doing it?". Well, it happened. Anyway, the reason why I wrote this post is because people ask why use soa patterns, why document soa patterns and there's only one simple reason for me doing this.

SOA patterns are proven solutions for common problems. Some of the candidate patterns I have documented may seem like open doors or a shot at an empty goal - but that's a good thing actually. What it means is that people recognize a certain solution for a problem, to actually be a good or at least a viable solution. And if people recognize the structure of the solution, does that not mean that they have just recognized a pattern in what they are seeing?

SOA patterns are not rocket science. SOA patterns are not smart or unique solutions. They are merely a common way of fixing a problem that we've encountered more than just once.

Does that mean it should not be documented? I don't think so. By documenting it, I hope to save other people's time and effort; because if someone recognized a documented soa pattern as something that can be applied to their problem, with or without modifications, that particular goal is reached.

Why am I publishing candidate patterns and not patterns? Well, it's in the definition of the soa pattern. For it to be a pattern it must be proven - as in - used in the field by more than one person. A site like soapatterns.org manages exactly that. This is also why I'm asking with every candidate pattern published by myself, to get others to testify that they have used the same pattern or a similar approach for real too. Because only if a group of people would testify they use it, it can be considered a pattern instead of a candidate pattern.

Keep watching this blog for a couple of more patterns to come. Presently finalizing patterns on rules normalization and rule layers and a couple of more service inventory related patterns. And if you've done something similar to what's described in one of my pattern candidates - or any others in the list - be sure to let the people at soapatterns.org know that you did. This way you, the community, can contribute to the field.

Regards,

-Roger

Monday, December 6, 2010

Published Reference Data Distribution candidate pattern

Hi all,

I just published the Reference Data Distribution candidate pattern. I'm looking for people to testify that they have applied this pattern to get it promoted from the candidate to the pattern status. Check it out here at soapatterns.org.

- Roger

Monday, November 29, 2010

Throttling is not trivial

Runtime governance tooling vendors, when they present their tools for ie. throttling, the way this is presented to potential customers is that they explain how installing these is easy and how implementing the actual behavior of the actual service agents is configuration only and this can simply be done by system administrators.

The pitfall is that customers exist who infer from these statements that they can purchase the licenses for any required service agents, give the CD to an administrator, and ultimately let the system administrator configure the throttling agents into the service inventory. All problems solved.

Let’s start by defining what throttling actually is from several points of view and then we can go from there to illustrate that these conclusions can be way off and can be very costly to resolve.

In the eyes of customers, throttling is a way to manage the amount of traffic to a service or a back end system. Expectation is often that they manage the amount of throughput by restricting the access to a service to not exceed a certain metric, ie. a specific number of calls per second or per minute, depending on load.

What does a statement like this mean, to the messages that arrive at the throttling agent and are marked to be beyond the predefined threshold?

For a customer this could mean that the message has to wait until the measured load does not exceed the threshold anymore.
To a throttling agent this usually means that the service call cannot be executed, because executing this would violate the expected threshold even more.

Throttling examples

What should happen to a service call exceeding the threshold? This actually depends on the purpose of a service and the context in which this service call is executed. A few examples are:

A read call for customer data intended for display in a client application
A read operation in the context of an update service composition
An update of an address based upon a client application request

Let’s highlight some characteristics for each of these:

Ad 1)
It seems OK to respond “too busy” to the service consumer. The consumer can retry in less busy times if it’s really important. To the throttling agent it means it’s ok to discard the message and respond “too busy”.

Ad 2)
This scenario is distinctly different from the first one. Discarding the read message to be executed in the context of an update, will most likely trigger a retry mechanism. The retry mechanism will make the same message come back potentially even quicker than a retry attempt coordinated by a service consumer. This will significantly increase the resource load on the message infrastructure as well as on the throttling mechanism. A way to overcome this issue is to have message properties which help identify in which context the message is being executed. And on the same throttled service, but in the different context, you can decide to allow this service call although messages in the ‘regular read’ scenario would be refused. For this you can even use two different throttling statistics.

This does however expose an issue related to throttling in service composition context. For elaborating this see the part of the article titled to “Where to throttle in the SOA”.

Ad 3)
A little less easy to deal with. Discarding the message does most likely not meet the business requirements. If the front end application (service consumer) is a web application which exposes a page to submit address changes, it might even cost customers ifthe data was typed and the system would say “too busy”. But then what? It seems that storing the message and trying later in less busy times should be fine.

One has to ask himself now whether the order in which messages are processed is significant in the context of the core service logic. This determines whether it’s fine to park requests which exceed the threshold for later execution. Even if the requests are parked for later use, this is just moving the problem to another place in the system or to another point in time. If the amount of throttled (parked) messages is big, when processing the parked messages you may face another throttling challenge. If we’re just moving the problem it does not seem like a viable solution.

If the order of execution is to be guaranteed, the solution mentioned cannot be executed as no new messages can be executed until the one exceeding the throttling policy can be successfully processed. This is another solution which does not seem viable to solve the throttling issue.

What other options do we have? If we look at this scenario, the throttling should do no more than manage that messages do not get lost while the availability of the service provider is not guaranteed (the throttling policy exceeded situation is the same as having an availability issue of the service provider). This can be solved by utilizing a queue to mitigate for the times of reduced availability of the service provider. This should be sufficiently supporting that (eventually) the address update is being executed.

A way of throttling in this situation is to have messages read from a queue (the messages posted by a service consumer) at a maximum predefined rate to prevent the throttling policy from being violated. This can only be done if the consumer does not require a synchronous response to its update request message. This is a perfectly fine solution where no messages get lost, the order of execution is maintained only if messages are read from the queue one at a time which then seriously impacts the scalability of the throttled service.

The processing order of messages is relevant if out of order execution causes data integrity issues or service failures. Some examples are: two subsequent address changes on the same customer result in the incorrect address details in the customer database if executed out of order. Or similarly, if the core service logic of a service must create subsequent service calls, and the second one cannot be completed successfully without the first one being executed successfully, this will create similar data inconsistency issues.

A way to make the system more scalable is when messages can be skipped if a more recent one was received; this applies in certain situations only. When taking the two subsequent address changes as an example: only the most recent message for the address change is relevant to the system. This can be achieved by assigning a time based or sequence based message property or header element to a message upon receipt by the service consumer. Perhaps the service consume can already assign this to the request. If an expression can be used to identify whether a processed message can be dropped because a more recent one has been processed before. (A similar system can be used to detect replay attacks). For this to work, a form of data store must be available in the system to keep track of these requests.

Where to throttle in the SOA?

Where should throttling happen anyway? There is no single right answer to this topic. Let’s consider the following layered service inventory:

a - Public services controlling the access to services inside the service inventory, acting as an endpoint for all external access to the service inventory
b - Orchestration services (orchestrated task services) controlling all centralized and long-running processes
c - Business services (ie. task services, utility services) which are the sole access point to any underlying layer
d - Data Services (ie. entity services, utility services) controlling all back end access.

What happens if we throttle on services in each of these layers?

Ad a)
This can control the amount of traffic allowed into the service inventory but what does that achieve? It would only achieve throttling on that level in the infrastructure. It would be good for controlling specific service consumer policies and indirectly keep the load on the underlying system manageable but in the end, many public services can access the same business services or back end services in complex compositions resulting in a significantly greater amount of requests to the back end which can be a multitude of the amount of client requests. Furthermore, not all business service would need to be exposed on public level, meaning that load can exist on the lower layers inside the service inventory that the throttling mechanism would not be measuring.

Ad b)
A system can hardly be throttled on this level, as process starters are often inside the orchestration engine and cannot be exposed to service agents. This means that if a process must be controlled, it’s probably best to throttle externally to the process, ie. in the layers ‘above’ or ‘below’ the orchestration layer. Or by controlling the flow of messages to and from the process in general.

Ad c)
Business services might seem to be the best place to control throttling because they see all the traffic coming into the system has to pass by a business services. They see the traffic from the top layers and the traffic from the layers below. But multiple business services may be composed together in complex compositions, resulting in a throttling nightmare when the need to throttle arises on these kinds of services. For example, if I had an order service which is used in a composition which should also invoice the customer and schedule an electronic payment, on which of these elements (functional business service areas) do I throttle? On one of the composed services? On the service composition controller? On all? Even here a single answer is not possible because it all depends on the purpose of the throttling and potentially this differs per throttled service.

Ad d)
This can control the back end load probably best, but as the data access services usually do not know the context in which they are being called, applying the throttling on this level has its restrictions. Referring back to 2) if the read of underlying data is relevant for an update operation on the same service composition, then what kind of an effect does refusing the read operation have on the service composition, and what is the consequence of that to the bus infrastructure? This is not easy to answer as it is probably different for many services.

In the end I think it all comes down to why the throttling happens from business or technology point of view. Sometimes the throttling happens to protect the legacy and back end systems, which should allow for throttling on data service level. Sometime the throttling happens to protect the middleware from excessive load, which can probably best be manages at business service level. Sometimes it’s one specific consumer which can threaten the entire system’s availability and then a throttling policy for selected relevant service capabilities exposed to that particular consumer may be used to protect other consumers from the load caused in the middleware by the throttled one.

Each method has it’s pros and cons. When looking at the overall picture, it may be perfectly fine to throttle on two or three levels. Although a combined throttling policy may not be the most easy to comprehend and it may not be using the system resources to the best extent, it still remains a popular method as it guards a number of key parameters of the system. This results in a solution which is still manageable without the need for capacity enhancements.

Of course throttling policies can be used in other ways, for example to give priority to certain messages, or messages from certain consumers or customer requests on the system, and many other ways exist, but this post is just an example and I can never convey all the issues and opportunities of throttling in one single post.

Dealing with throttled services in a reference architecture.

I hope that this post conveys that throttling is not trivial. In fact it’s crucial to have an up-front analysis done for your throttling architecture before any policy is applied. This can be formalized by documenting specific approaches for specific situations in a reference architecture document.

A well-respected colleague of mine once said: the throttled message can be discarded and the service consumer can be thrown a technical exception. Although this may be fine for many messages and throttling implementations, please be aware that more options exist and can be addressed by having a throttling (reference) architecture.

A throttled service may throw a technical exception. Usually technical exceptions are treated by consumers as a permanent failure for a service call. If the call was a read operation, probably it may not happen again, but if it’s a write operation, the consumer may have retry mechanisms in place which might immediately result in another call with the same message, This however is the easiest and most straight-forward implementation and can be introduced without really big implementation issues in the system. Most initial implementations may be using this method. Some caution with this statement: if the consumer treats this as an invalid message call to another system, some elaborate log analysis sessions may come from this, since people cannot tell the difference between a back end availability issue and a throttled message response. To make this difference, you may not be able to avoid customization of the services.

A throttled service may throw a technical or functional status (“not now”) but in the end this means that the service consumers must be able to understand this message. What it means is that at present the message cannot be completed. Probably it does not make sense to retry the message at this point in time but a retry at a later time may work perfectly well. This means that a delayed retry may succeed after all, whereas an immediate retry would not.

Once a reference architecture exists, it should be easier for system administrators to think about and implement new policies, and fine-tune throttled entities. But be aware that, depending on how elaborate the throttling architecture is, the complexity of throttling may dramatically increase. Even if current throttling parameters are understood perfectly well, a dependency analysis must be conducted to be able to fully assess and understand the implementation of a new throttling policy.

A similar risk may apply for making changes to an existing throttling policy. As soon as you tune the throttling policy to switch close to a point near “typical load”, or the typical load increases to a value near the configured throttling policy, dramatic changes in system performance and behavior can be expected.

My advice would be to have and changes or new policies investigated by a team which consists of administrators (system current knowledge), architects (system dependencies and consequences) and capacity planners (system future load).

Throttling is not trivial; it's as simple as that!

Thursday, November 25, 2010

Published the Reference Data Observer candidate pattern

Hi all,

I just published the Reference Data Observer candidate pattern. I'm looking for people to testify that they have applied this pattern to get it promoted from the candidate to the pattern status. Check it out here at soapatterns.org.

- Roger

Tuesday, November 2, 2010

Published the Service Data Forward Cache candidate pattern

Hi everyone,

I just published the candidate pattern Service Data Forward Cache. Looking for people to testify that they have applied this pattern to get it promoted from the candidate to the pattern status. Check it out here at www.soapatterns.org.

-Roger

Wednesday, October 20, 2010

Published the Configurable Contract SOA Design Pattern Candidate

Hi all,

I just published the Configurable Contract SOA Design Pattern Candidate on www.soapatterns.org.
I'm looking for people who can verify that they have successfully applied this pattern to be able to get it promoted to a recognized SOA pattern.

-Roger

Monday, October 18, 2010

Certified SOA Architect

Finally managed to make some time and get the SOA Architect Certification done. For more info see www.soaschool.com.

-Roger

Monday, September 20, 2010

Agnostic / Non agnostic

Recently I had a conversation with a customer who did not understand the difference between agnostic and non agnostic.

An example from PCI compliance pov could elaborate which I would like to share.

In service orientation one of the guiding principles is to separate agnostic and non-agnostic service logic. This can be done by separating agnostic service logic into one service and any non-agnostic parts into (an) other service(s).

The definition of agnostic/non agnostic did not help him onto the right track so we used PCI compliance to allow for an appropriate example.

To check the definition of agnostic/non-agnostic, check it at SOAGlossary.com

Summarized, agnostic logic is supposed to be more reusable than non-agnostic logic. Separating agnostic logic from non-agnostic logic is done to increase the reuse potential of a service.

In PCI compliance, one of the scenarios is where customer management representatives view customer details. The business rules dictate that unless explicitly necessary, a card number should not be readable. In this example, the method of making a card number not readable is by masking parts of the number when displayed.

Several places for masking can be identified:

1. In the service which retrieves the customer details including payment data (as the customer had requested)
2. In the front end(s) which use this service to display the payment details are changed where necessary
3. Somewhere else?

Regarding 1)

Introduces the issue that when the service mask the payment details all the time, the service can never be used in other scenario’s ie. when requirements for viewing unmasked details or when the service is used on other service compositions and unmasked data is necessary. Since the service is agnostic, it does not know the reason as to why the details are retrieved.

Regarding 2)

Introduces the issue that during the initial implementation one might catch all the scenarios, but as new screens and applications (front ends) are built, no guarantees exist that these comply with the PCI requirements. Furthermore, if the amount of front ends or screens is large, the effort to incorporate the masking logic may be considerable, resulting in an expensive implementation.

Regarding 3)

This is where the agnostic/non-agnostic can be explained more easily and this is what the remainder of this post will elaborate on.

Agnostic logic should not be aware of the reason why it’s used. This means it would lack functional or process context. If the knowledge like reason or process context must not be in the service which retrieves the customer’s details we can move that knowledge and corresponding logic into a separate service which is non-agnostic. In fact, the task service type as defined by Thomas Erl et al is intended to do exactly this: encapsulate non-agnostic logic.

So we’ve ended up with a customer details service which is agnostic and will retrieve unmasked credit card number data. This can be considered an entity services as per Thomas Erl et al definition.

And we’ve ended up with one or more task service(s) which should be able to distinguish from the agnostic service in the sense that it knows whether or not to mask credit card details.

As indicated the customer details service is agnostic and will not perform the masking logic. If we use the contract centralization pattern and the logic centralization pattern, we have created an official end point pattern. This means we can force users in the service inventory to not use the agnostic service directly but force them to use the non-agnostic task services.

A task service can then deal with the non-agnostic context. A task service is built in a certain context, ie. it retrieves the customer details for sales process purposes. A sales process should typically not be interested in the credit card number. The non-agnostic service, intended for the sales process would always mask the card data.

In the context of a billing process, another task service could expose the unmasked number so that particular service support the actual act of making payment with the card number.

The end result is that:

- the reusable agnostic entity service is not publicly exposed

- one or more non-agnostic task services are publicly exposed, managing the masking logic depending on the context it's called in.

Note that the two task services could be replaced by one task service that could perform masking of card holder data, based upon additional parameters, ie. a reason code, which represents the context in which the service is called. The task service would then decide, based upon the reason code, whether or not to mask any data.

Hope this one helps a bit - I know it's a bit over-simplified but this should paint the picture.

Friday, July 2, 2010

ACID vs. BASE

We all know that ACID and BASE are opposites, both in the chemical world, as well as in the IT world. They are used for different purposes and should not be confused with each other as they are distinctly different. In the IT world, ACID and BASE can be used complementary, similar to the chemical world.

In IT-land, both mechanisms are intended to make sure the end result of an operation leaves a system in a consistent state. The way they do it however, are radically different. And which one you need, depends on your needs, or better, your business's needs. Default reaction of almost everyone at business level would be "I need ACID". In my entire career I have only ever met one business principal who was fine with BASE after I had suggested it. Everyone always wanted to see 'the full monty' regarding consistence, no-one was willing to do any trade-offs to free up some system resources, even if this meant purchasing more powerful hardware resources.

The difference for a person applying the patterns, between ACID and BASE, are concentrated around the amount of effort required to implement (design-time), and the amount of effort for systems to execute the pattern, also known as scalability (runtime). Where nowadays, most middle-ware has support for ACID making life for designers a lot easier, there is no such thing for BASE. Find out why in this article.

ACID - Atomic, Consistent, Isolated, Durable

A transaction must be explicitly committed or fully rolled back (sometimes the environment or middleware automates part of this concern). In principle, as long as a transaction is not committed or rolled back, it can keep or lock huge amounts of resources, like memory, disk space etcetera). While in progress, the system resources consumed keep the overall solution from being scalable, as smaller resource consumption makes more operations fit the same machine. The bigger volume of resources a transaction comprises, the less scalable the system becomes, as less operations fit the same machine, or same group of machines.

If it is really required, the approach is a (often standardized - for example "XA" distributed transactions) method to ensure a system is consistent at all times. The concept of a distributed transaction spans across multiple (parts of) a system and has those parts participating in this transaction. Any participating (service) logic will be eventually entirely committed, or fully rolled back.
Note that committed is not always analogous to successfully executed to the fullest extent; it rather means that the system is left in a consistent state. It could be that in order to make things 'committable', or consistent, an intermediate valid system state is reached which also allows committing but does not reach the ultimate intended goal - yet. An example is where logic has a state deferral mechanism in place which is used to make the system consistent and to break up the transaction in manageable parts. This is by the way an excellent mechanism to make transactions more scalable. Because the spanned logic, hence consumed resources, are broken into smaller pieces, the individual parts become more manageable and inherently makes them more scalable. Furthermore, if the transaction coordination mechanism is a 'generic' implementation, typically it has safeguards built-in to make sure that everything is fine, like a double-commit pattern. This all costs time and resources and can be counter-productive if your system architects, designers and developers do not know what they are doing.

BASE - Basically Available, Soft state, Eventually consistent

In a BASE pattern, the need to commit does not exist. If things happen according to plan, they are done. No need to confirm when they are done, to anyone, if you don't want to, contrary to a pattern like XA-transactions. There is a downside to the BASE pattern: there is no (explicit) rollback mechanism. If does not exist, the architect/designer is responsible for creating so-called compensation algorithms. These are pieces of logic which handle specific exception scenarios. It is easy to forget one or two so it requires more than ample planning and design to make sure all crucial scenarios are covered. The pattern coordinator (the implemented consistency control mechanism of this pattern) must manage its own state and progress in the process. This means that it must track what it did, and based on when and where things went wrong, explicitly execute compensation activities for undoing the parts of the work which did go OK until that moment. There is however one big advantage: because the system generally does not need to keep track of your process's status -you manage that for the system in a tailored-to-the-need way- less resources are consumed, which would have a very positive influence on your system's performance and scalability. Also, as the architect or designer can choose to selectively or strategically apply checkpoint logic, potentially less logic is executed. This allows a service designer to focus more on the actual core service logic.This is because the architect is in full control of the "when and how" of any applied core logic as well as supporting logic. Since no standardized method to ensure consistency of the system exists, temporary inconsistency is a almost-certain consequence. It is not only a consequence; it is the actual foundation why this pattern is so much more scalable. Temporary inconsistency is allowed in a BASE pattern to make matters more manageable. The information and core service capability is "basically available". Contrary to an ACID managed consistence, some inaccuracy is temporarily allowed and data may change while being used (must be known and accepted by the consumer of the service) to reduce the amount of consumed resources (soft state). Eventually, when all service logic is executed, the system is left in a consistent state (eventually consistent). Presumably, as the core service logic executes relatively quickly, the amount of occasions where the service logic is required but only available in inconsistent state should be generally manageable.

Scalability and consistency patterns

As ACID resources are generally larger groups of resources, the easiest way of scaling up, is scaling vertically, also known as purchasing larger machines. As there is a limit to the physical size of a machine - you can only get them 'this big', this is not a very future-proof approach. Also, when increasing hardware capacity this way, quite often a certain amount of capacity is wasted to make room for the new hardware (ie. by replacing the current machine with the next bigger model, or by replacing smaller memory modules by bigger ones: what happens to the old hardware?).

Horizontal scaling, also known as deploying -concurrently- more machines to process more data, is far more extensible and typically cheaper, as you always can add capacity without wasting the capacity you already have. By having more machines, two methods of horizontal scaling can be applied: functional scaling or sharding. Functional scaling is distributing pieces of logic of the same capability to a certain group of hardware nodes, and another group of functionality to another group of hardware nodes. This is -ie in the database world- known as partitioning. Sharding is the act of deploying the same functionality/capability across multiple nodes. By nature, horizontal scaling is more complex than vertical scaling, but the benefits may outweigh the cost of the design and architecture.

BASE resources are typically a lot smaller and inherently less hardware nodes are required, or smaller nodes are required to reach the necessary system capacity.

In a SOA, it is because of the nature of the comprised distributed service capabilities and the availability of standards, or better the lack thereof, for implementing consistency patterns, extremely difficult to reach 'transactional integrity' across service boundaries. By nature, the ACID approach is difficult to implement, if not impossible with certain platforms. The BASE approach is, especially outside service boundaries more easily implementable. If we look at task services and certainly orchestrated task services, the BASE approach is the default approach and anything 'better' must be custom-built. This is why it's such a good idea to offer BASE approach more often than the ACID approach. Offering BASE is more easily if it's known what the accepted margin-of-error is by the business or project principal. This can be discovered during requirements clarification phase, a very important phase of every service delivery effort. Without a proper requirements and business process investigation (discovery, clarification, refinement), it's virtually impossible to see whether you need ACID or BASE.

Monday, June 28, 2010

It's been a while

Every now and again there is a period of extreme business in one's work and mine is definitely the past 6 months. It has been a while since I have been able to post new messages but I may have a fair chance of posting a new article on ACID and BASE which has been on my mind for quite some time now - please bear with me...

Wednesday, May 12, 2010

How off-shoring work affects economy.

This one is on my mind for a long time now... How off-shoring reduces the capability of the local people. How reduced capabilities of local people affects the economy of our country. How this effect in many countries affects world economy...

I live in a wealthy country. No doubt about that! And I don't regret living here. The Netherlands is a nice country to live in and the climate is OK-ish :) - I noticed on a recent visit to India and Egypt that the weather can be better - but I'm not complaining at all!

Here's the deal: current economic climate in our country sort of dictates that price and time to deliver is what you need to be able to compete. If you are too expensive you won't deliver services to other companies. And the companies want time-to-market to preferably be yesterday. Competitive planning and cost can only be achieved with resources outside of our country - simply because the rates are lower. Because of the lower rates, we can plan for more resources hence reducing the time to deliver. I know that this is a statement from utopia but generally you should get the picture.

Initially one starts off-shoring test work. This would reduce the investments in testing. Still you would need local resources, but less. Next, you figure out that perhaps the build work could be off-shored as well. So you create a reference architecture (don't forget!) and off-shore the build activities. Yes you need local resources but again, less. Other competitors do the same so in order to stay competitive you need to off-shore more. You train the most senior resources in off-shore locations to do the design work for you and let them work locally for a while. In the end you let them go back again and more work is done from off-shore. You have a few local resources to manage the off-shore resources, but far less than initially. Now the issue becomes apparent: the design work is off-shored too. A designer can only do "so much" in splendid isolation before he needs to go back to the delivery organisation and get re-educated on current technology and architectural frameworks. Now this is a lot harder as the resources to learn from are not local. So this will happen less and less. Next step is that the architects see that the effectiveness of the local resources reduces. They cannot rely on them as much as they could when all the resources, from testers, developers and designers were still locally available. Less and less knowledge resides in the heads of local people and more knowledge resides in the heads of remote people.

On large scale, this movement is threatening to our current economic climate. Change is good but I have my doubts about this one. What's the end state? That the local "architect" resources are nothing more but project managers, or perhaps even program managers, when they start off-shoring the local project managers too? Where does this end? Do we still have local jobs? I know that the off-shore resources will become more expensive and a balance will be the result, but before the balance is reached, the more expensive off-shore resources will be replaced by cheaper off-shore resources from another country. And this will most likely be repeated in a wave-like pattern with it ups and downs...

Don't get me wrong - It's not that I begrudge other people having better times but it still feels threatening to me. I guess it is that I'm afraid for what I, or especially my kids will notice: we will become dependent on other countries. And because of that, they will have a harder time living. Now I know that's also what my parents thought and their parents -our kids will live in a much harder world to live in- but I still can't say that I am comfortable with it. One always wishes the best for their kids, and this one is not putting my mind at ease...

Friday, April 23, 2010

Thoughts on the SOA Manifesto 2/2

Guiding Principles, important statements to make when thinking about SOA in general.

Respect the social and power structure of the organization.

We need to know that since every organization is different, and a SOA is business-driven, every SOA implementation is different. By respecting the social and power structure in the organization, we are allowed to continue developing a SOA architecture and implementation, whereas "rowing upstream" significantly reduces our productivity, as well as it may kill the entire SOA initiative.

Recognize that SOA ultimately demands change on many levels.

Since a SOA is business-driven and also aligning business / business and IT, a number of changes can be expected, not only in the IT landscape. For example, since processes are shared, processes and organizational departments must be aligned. Since infrastructure, service provided etc. are also shared, this requires different ways of funding projects, as well as new roles and responsibilities to be defined ie. for governance of the SOA, project and program management, program alignment, clear distinction for who is responsible for which aspects of the SOA and the organization etc. This has impact on all levels of the organization.

The scope of SOA adoption can vary. Keep efforts manageable and within meaningful boundaries.
Well, keep it small and simple (K.I.S.S.) I guess... Try, when starting with SOA adoption define a meaningful boundary for the SOA adoption. Start small, start simple. Not only with the SOA implementation, but also when defining the scope. Try a proof of concept; try experiencing implementation of a few (sharable) services in production. Get your experience with them, and try to see how you can learn. Only then you should be ready to grow from there, in what is probably a multi-year effort, towards the intended initial scope. The scope itself - how far you adopt SOA - can be increased incrementally in a programme approach. In SOA, the big-bang approach is recipe for failure. It may be better to start off small and iteratively see how you can 'grow your garden' towards the 'end state'. But please remember, a garden is never finished and constantly needs maintenance and attention. In a garden usually there is also some waste; sometimes a bit more and sometimes a bit less: don't hesitate to discard some of your investments if there is no more clear business value in them.

Products and standards alone will neither give you SOA nor apply the service orientation paradigm for you.

SOA is not a tool, nor a silver bullet that you can buy, install and you have lift-off. It takes planning and skill to create a SOA. Actually, it is not something you build, but the way you do things. If you live by the service orientation principles, you have a fair chance of getting there. Buying tools without guidelines how to use them, without a proper plan in the appropriate programme, lacking management buy-in and governance to regulate, the product implementation will be ill-spent money.

SOA can be realized through a variety of technologies and standards.

As long as you stick to the principles of service orientation, you can realize any flavour of service orientation. Every technology and standard has its pros and cons. It all boils down to how you use them. That's why SOA can be implemented using most technologies and standards. SOAP is not the implementation of a SOA, but a SOA can be implemented using SOAP. And SOAP can be used on various transports like JMS, HTTP etc. But SOAP on itself, nor the transports mentioned are mandatory when "going SOA". Any middleware you use which allows utilizing the principles of SOA will suffice. And technologies can be mixed if it makes sense.

Establish a uniform set of enterprise standards and policies based on industry, de facto, and community standards.

Enterprise standards and policies help defining SOA elements in a consistent way. By applying the same guidelines, chances of (intrinsic) interoperability increase and this is key to a SOA foundation. Standards, policies and guidelines are not a necessary evil: without them there would not be a SOA. This implies that these must be properly enforced and governance is required to ensure they are applies consistently.

Pursue uniformity on the outside while allowing diversity on the inside.
This one is a bit more difficult and less obvious than the rest. By introducing uniformity on the outside (standardization at the interface level) you make interoperability easier, because more diversity increases a learning curve which reduces the achievable reuse. This is because when things get harder (to comprehend), the potential for reuse decreases. Having said that, internally, anything should be possible. Diversity should be allowed to be able to deliver effective core service logic. Effective core service logic contributes to the attractiveness of services and would inherently increase reuse potential. The one size fits all approach does not work on the inside as well as the outside. Some balancing and fair trade-offs may be required to make it work.

Identify services through collaboration with business and technology stakeholders.

No-one can understand an entire business. Noone can design a service or a system without understanding why he is doing it. That's where the business stakeholders come in. Also no-one can understand all the technology requirements and implementation consequences. This required technology stakeholders to participate. On the other hand, cooperation requires trust working both ways. The business participants should realize that in order to maximize enterprise effectiveness, the scope for the SOA architect can be larger than their own scope, and allow the technology people to design services which support the business as a whole.

Maximize service usage by considering the current and future scope of utilization.

Plan ahead. See how a service fits into the programme of projects and initiatives to shape the SOA. Try to predict how the SOA evolves and see how the service can be of greater benefit by predicting how the programme and long term plans affect the consumption of the service. Try to see how the potential of the service can be increased ie. by extrapolating how this service can be useful for other types of consumers, channels etcetera. This would ultimately result in a increased return on investment (ROI).

Verify that services satisfy business requirements and goals.

Obviously, business requirements drive a SOA. Without business requirements, there would not be a need for a SOA and services would probably not be the best approach as the effective implementation of services creates certain overhead which would cost a lot of money, resulting in decreased performance of the architecture. Without a goal, pursuing business requirements may result in a majority of tactical decisions preventing to reach the actual enterprise goals. So while implementing services from tactical point of view, try to never loose the strategic goals out of sight.

Evolve services and their organization in response to real use.

Let me come back to the gardening analogy I made earlier. When creating a garden, everything makes perfect sense. All plants and flowers look great together and everything is tuned. But no garden can survive without gardening (maintenance). In the course of time, taste (requirements) changes. Also sometimes, things that looked great initially, may seem trivial after a while, or may even explicitly be undesired (change of enterprise goals). Be prepared to review and maintain the (service) garden and make sure it's always aligned with the actually expected use. When a garden (SOA) is not aligned with the expectations, use decreases and in the end, has no right to survive and may even be replaced by something else.

Separate the different aspects of a system that change at different rates.

I feel this statement is a bit narrowing the actual potential it could have had. If the statement would be "Separate concerns" I could have better agreed with it. When separation of concerns, as a principle, is properly applied, this greatly increases the comprehensibility and decreases complexity (learning curve). Separation of aspects of a system that change at different rates would be a special version (implementation) of the statement to separate concerns.

Reduce implicit dependencies and publish all external dependencies to increase robustness and reduce the impact of change.

Reduction of implicit dependencies actually refers to the principle of abstraction. Abstraction leads to loose coupling. Loose coupling is key to a SOA as it increases flexibility and consequently the ability to change. When the ability to change increases, the impact of change is lower.

At every level of abstraction, organize each service around a cohesive and manageable unit of functionality.

Abstract each functional group of logic into a self-contained, isolated, cohesive logical unit. This makes the functionality more manageable by the separation of concerns principle. A group of related logical units can be encapsulated into a service. A service can have many capabilities it ultimately encapsulates a manageable unit of functionality. By having isolated units of work, autonomy and composability increases contributing to the (re)use of services.

Perhaps it's wise to make the following statement: Thomas Erl already described the principles for service orientation. The statements in these posts do not attempt to redefine these principles. If you want to review them, take a look at the soabooks.com website where you should find lots of information on the Prentice Hall SOA Books by Thomas Erl et al.

Phew... this is it for now. I hope this gives food for thought. If you agree or especially if you disagree with the statements made here, please be invited to drop me a line and explain your point of view. Perhaps I have to change my insights and way of thinking :)

Sunday, April 4, 2010

Thoughts on the SOA Manifesto 1/2

October 2009 I was at the 2nd SOA Symposium in Rotterdam where the final efforts were made and the actual presentation of the "SOA Manifesto" happened.

This post is how I interpret the SOA Manifesto statements and I would like to encourage people to post their concerns / remarks on this vision.

Business value over technical strategy

While having a planned technical strategy is extremely important, it's also the business who pays the bills. The cost of projects is often not really aligned with the expected business value, so we have to make a trade-off between our strategy and make tactical decisions to keep the business happy. When we keep the business happy, we are most likely allowed to continue our planned path to destiny and in the end get close to our technical strategy goals. If we push too much on the technical strategy end, we risk loosing our customer's commitment which may blow the entire SOA architecture efforts, since they invest in things that do not generate the appropriate business value.

Strategic goals over project-specific benefits

Traditionally, project point of view would strive for short term gains, to reach the project goals. But sometimes the project goals are not in the right direction of the strategic goals. Although the project manager has all interest and most value to gain from meeting the project benefits, some restraint must be taken in order to pursue these at all cost, especially when they are not aligned with the strategic goals. Usually a project is aligned with the strategic goals of the company, but sometimes these goals are not the way to deliver the project easiest. This is where mixed concerns get in the way of everyone. One should not allow project goals more significant than the strategic goals, the latter typically being the reason why the project was started in the first place. As SOA delivers enterprise strategic value, the projects should implement this value on a project-by-project basis whilst inside those projects, never loosing the enterprise strategic goals out of sight.

Intrinsic interoperability over custom integration

Although often a custom integration may be quicker to implement, it is typically as the name says, custom work - it cannot be reused somewhere else, give or take a few exceptions. By using standards and standardized interface definitions, it is easier to use and reuse these interfaces. This statement means that both the standardized technical interfaces as well as a shared data definition model can help. Particularly the latter helps in a semantic way - understanding the interface is made a lot easier. This results in quicker adoption of the interface and hence its used standards. Even the shared data model can be according to industry standards. Using these industry standard data models typically requires a lot more ramp-up time and implementation time due to the learning curve, but once understood, they can be discussed about with people from ie. other companies, without their prior knowledge of your company - this could even make it easier to hire external help to finish your projects.

Shared services over specific-purpose implementations

When focussing on specific-purpose implementations, these often very easily delivered. Issue with the approach is that for every new business functionality requested, a new tailored service is created. This causes that no reuse happens: services are not shared. A typical approach is where service logic is copied, and adapter for a slightly different service. Although quicker to develop, the use is minimal if not negligible, you still increase the problem operational complexity whereas a shared service typically requires no modification at all, hence must be managed only once. Note that to work with shared services, considerable extra design/development time must be incorporated to cope for service discoverability and description of the service interaction in the context of new functionality. Then you have the issue of "i'm special", "not invented by me" syndrome. These are often symptoms of arrogance or distrust and are killing to any organization. When getting over these issues organizations allow for use of shared services, reducing the total cost of ownership as no custom designed implementations have to be managed.

Flexibility over optimization

What happens very often, is that logic is optimized for the current scope. This would typically be the case for any project-based approach. Any service or capability delivered often is a service with limited scope from project point of view. However from enterprise point of view it may be very wise to add some flexibility (at the cost of project budget and timelines). Don't think this will come for free. Issues like up-front planning and rethinking design from multi-use point of view may take extra time during design. Often the implementation time is not too severely impacted. For example, a service may be built to perform work for an online shop application. But this typically locks the service to be useful for the online shop channel only. If we were to foresee that this service can be used by other channels, or in other service compositions, we could cope for the future flexible use by ie. as simple as adding channel ID as a parameter, even though we do not need it for the current project. Any next project or service composition could use the service in a channel-aware fashion.

Evolutionary refinement over pursuit of initial perfection

Reaching initial perfection is probably utopia. Most SOA implementations are complex systems and all the ins and outs are learned over time. Trying to reach it when starting with a new solution will delay projects, cost a lot of money and ultimately may not fly at all because the cost of this is so high, that the business forces us to look for other alternatives. Remember, the business pays the bills :). Initial perfection is hard to reach anyway, if not unrealistic because it would require that all requirements, including future requirements, must be known at the start of the SOA initiative. As requirements change over time, the 'perfect' solution would grow out of sync with requirements soon. This would mean you spent a lot of effort on things that may never happen. Perhaps a better approach is to use KISS: "Keep It Small And Simple". Start small and be pragmatic. Do what you can now and leave what you can't until you can resolve in future iterations or projects. This is what is being referred to as evolutionary refinement

------------------------------------------------
As you can see the value statements in the SOA manifesto are not defined in a "SMART" way. They are not measurable and this causes confusion and leaves room for discussion. Discussion is good however. As long as it does not stop us in our tracks because we cannot resolve our differences, there is progress and it forces us to think about what we are doing. Not one implementation of a SOA will be the same. There is no "one size fits all" solution. If this were the case we would be out of jobs. Use the statements made here to your own benefit when you need them and make up your own mind if you need to. Nothing stated here is "the unified truth for all", just use these statements as best practices whenever you find fit.
------------------------------------------------

Friday, March 19, 2010

Services in real life: Traffic Management Service: How Granularity affects service behaviour and efficiency

SOA Service concepts can be found in real life too. Some time ago, outside my office, road workers were replacing and reprogramming the traffic lights at a roundabout. When I saw the result of what they were doing I realized that the traffic light services and corresponding traffic management system can illustrate how a (SOA) service approach affects many of the service orientation design principles.

My office in Maastricht, Netherlands

For years I have watched the traffic light sequence outside my office. When walking across the street I did not need to press the button for pedestrians because I knew the sequence of the traffic lights by heart. I also remember that at any given time, only one lane of cars would be allowed and one group of pedestrians would be allowed to cross. Also, all traffic must stop before a new configuration was activated. This had resulted in small traffic jams all centered around this roundabout during busy hours and at times of sudden bursts of traffic.

Take a look at this street plan:

The roundabout

The actual situation is a bit more complex with bus and cab lanes, bicycle roads etc. but for illustrating this schematic should suffice. It's a Dutch roundabout so the traffic moves counterclockwise as indicated. (The dark surface in the top right corner is part of my office :))

A ... E are the car traffic lights.
R ... X are pedestrian crossing traffic lights

The old situation as mentioned, would be that at any given time, only one lane can drive the roundabout. ie. lights ABF is green, or CFE etc. Also, the pedestrian lights would only be green at one street at a time, ie. RS, TU, or VWX. Sometimes, all traffic has to stop in order to allow pedestrians to cross. This describes a very inflexible way of managing traffic. The reason for allowing all traffic to pass from a certain point of view, was optimisation: if a traffic stream is moving, make sure all of it moves. Because no real-time feedback was given to the system, the system would be in a certain configuration for a long period of time. This same configuration would be applied over and over again, endlessly without any change due to traffic load or time of day. Because the system is not interlinked with the rest of the infrastructure in the neighborhood, it was an optimized system from the local perspective, but not at all optimized to operate in the bigger picture. This is a typical example of a situation where optimization on micro level is counter-productive on macro level.

Assuming the TMS would be an SOA Service, the following observations can be made:

Autonomy of a certain traffic light was poor as it would only be allowed to operate in specific sequences of configurations
A "Stop all" configuration was necessary between all configuration changes. This restricted the maximum switching frequency
Granularity of the service composition (TLS) was poor as complete road paths would be switched at any given time, not individual lanes or lights.

Overall throughput would be low because of the dead time in the 'stop all' configuration and the time when a specific configuration was active with no or a low volume of cars present. After replacing the lights, a number of important improvements were made:

the system would allow partial pedestrian crossings (ie. only T or U instead of 'all or nothing')
the system would allow for individual control of two lanes at the same insertion point (ie. one lane for straight ahead and one for right turn)
the system would measure traffic load to accommodate for bursts in traffic coming from a certain direction
the system would be linked into the network of traffic lights in the surrounding infrastructure a 'green wave' to allow for traffic to leave the part of town fairly unhindered by the network of traffic lights.

Because the granularity is increased (smaller parts which are individually configurable) the system allows for greater controllability:

optimized configurations are possible to allow for the throughput of traffic stressed lanes
empty lanes do not unnecessarily get 'green' time
special configuration sequences are allowed for busy hour traffic flow improvement
pedestrians can be allowed to cross half of the lane without impacting the flow of traffic in that lane; shortly after the pedestrians are allowed to cross half-way, the remaining part is given a green light.
timing the macro-level optimized configurations with the rest of the surrounding infrastructures' traffic lights.

When making the analogy to service orientation, we see that the same concepts apply:

The overall road infrastructure can be considered a service inventory: all relevant infrastructure elements are managed in the same (domain) service inventory. We are assuming a domain service inventory here as all service configurations are probably limited to a certain geographic area in town. Potentially other service inventories exist with different rules, principles and configurations to manage different kinds of traffic.

Traffic Orchestration and Traffic Management services

The traffic management service (TMS) consists of a collection of coordinated traffic light services (TLS) each representing an individual traffic light. The traffic lights themselves are the smallest configurable service available (they can be configured red, amber or green). This is the defined service granularity (*) for this roundabout.

Because the granularity is better, more complex, optimized service compositions can be created.

The inter-intersection "infrastructure level" synchronization would probably be implemented as a service orchestration (**), the intersection or roundabout level light configurations as service compositions, managed by another service composition controlling when a certain configuration is necessary, based upon the input it receives from sensors in the road.

(*) A note on service granularity: there is no 'rulebook' on service granularity. The 'appropriate' level of service granularity depends on many factors. Any smaller configurable service components, like on LED level - ie. a green, red or amber light consists of approx. 350 LEDs - would probably result in management chaos, as each LED must be individually controlled to comprise a certain configured color of traffic light. Any larger service components than traffic-light level granularity would probably result in the same situation which we had in the 'old traffic light' setup where ie. all traffic must stop to allow pedestrians to cross the roundabout at configuration TU, a case of poor service autonomy.

Conclusion: Because all traffic lights can be controlled individually the autonomy and composability of the services are higher when compared to the old setup.

(**) Note: The example mentions orchestration, although the term choreography would probably be better suitable here. The orchestrated individual parts (TMS) have a certain amount of autonomy not controllable from the orchestration level. 'Despite' the orchestration level, the TMS can allow other traffic to cross the roundabouts or intersections as well, to service other consumers to using the service, even during busy times.

In the light of this post, I would like to encourage everyone to see how service orientation and real-world examples are really not that far apart. Whether it be a traffic light service or a service based economy, numerous similarities can be found. No, I'm not implying this applies at all times, but.... try :)

Websequencediagrams.com:

TOS-->TMS: Orchestrate intersections note right of TOS: Traffic Orchestration Service:\nControls all intersections\nin a part of town note right of TMS: Traffic Management Service:\ncontrols at the intersection level TMS-->TLS: Control Service Compositions TLS-->TLS: Red(), Amber() or Green() note right of TLS: Traffic Light Services:\ncontrols at individual\ntraffic light level opt note right of TMS: one for every extra\ntraffic light in a composition TMS-->TLS: Control Service Compositions TLS-->TLS: Red(), Amber() or Green() end

Working on a few new posts

Just a small status update. As I have returned home from my travels I'm working on a few new posts and also a series of posts on defining/creating of a SOA Reference Architecture. Shortly you should see a post about service granularity and service orchestration/composition, as well as my thoughts on the SOA Manifesto as I had promised in one of my earlier posts.

Thursday, March 18, 2010

Respect!

Well this is it! Finally all my bags are packed. Clothes for tomorrow are ready and I packed an extra bottle of water to go... Boy this has really been a travel of contrasts. It is trips like these which really make one appreciate what he's got back home, in more ways than just one!

Also I learned a great deal of respect for these people who sometimes are months away from their family for a "short trip". Let me tell you it's heartbreaking when your son does not want to speak to you on the phone because he feels this is not the real thing, not good enough. "No I don't want to speak to papa on the phone - papa just has to come home!"...

Regarding work, this was a very successful trip. We cleared a lot of issues and identified opportunities for improvements. Our to-do list for the High-Level SOA Architecture are longer than ever! Also we have learned that it is good to meet the teams face-to-face. This way, the customer has a face and can explain why we do the things we do. Finally we got a chance to make the team really feel appreciated.

Tomorrow I am returning home via Dubai and Düsseldorf. I am soooo longing for a 'normal' dutch home-made sandwich :). Don't get me wrong - I really loved whatever it is I ate (cannot remember the names) - the food was really great. But sometimes it just the good old Home-Sweet-Home.

Friday, March 12, 2010

Chennai - the team

Salaam namaste,

It's Friday and almost the last working day in India and today we met with the team again. We took a picture of as many as would fit in the small room we've been working in most of the time.

Part of our team in Chennai

This week we again had very productive sessions on projects, programs, SOA reference architecture (main focus of this visit), productivity, (continuous) improvement and lots of other topics. These are long days and very exhausting, however they are also very rewarding. We got a chance to finally thank the team and discuss face to face what's on their minds.

Team, just one message to you all: Shukria!

Wednesday, March 10, 2010

Arriving in Chennai

Just arrived in Chennai and already had my first conference call - I guess work at home continues while we travel :)
Boy is this area hot - it was 35 in Bengaluru but the air was quite dry so the heat was bearable. Here in Chennai, it's 'only' 32 but as it's located at the Indian Ocean the air is very humid. When I got out of the airport I was wet all over my body in an instant!
I like the cabs! These I would usually see in movies only. Lacking airco, seat-belts or any kind of comfort they really have a certain charm.

Tuesday, March 9, 2010

Leaving for Chennai

Our engagement in the Bengaluru area is coming to an end. Thanks to Babu, Raj, Arun and Madhu for hosting us. These were great sessions and we had good progress. Tomorrow I'm travelling on to Chennai where I will be continuing the visit to the Indian offshore offices.

Sunday, March 7, 2010

Follow the process and thinking outside the current constraints

Today I am the second day in India. The hotel is really great!

While here, I have encountered a few great examples of how people can become stuck in processes and how continuous process improvement as well as better exception/compensation handling could have helped.

A lady arrives early from the airport and the staff had just begun clearing up the breakfast buffet. Everything was still there so there should not have been a problem. Unfortunately, because the staff to clean this up had already entered the breakfast room they apparently could not stop the process and the lady could not get anything to eat ; so much for hospitality in the hospitality business...
- later I have asked the staff about the how and why and the explanation was fairly simple: "this is how we have agreed to work sir. We first warn everyone in the room that we are about to close the buffet, and when everyone has indicated they are all fine with this we can start the cleanup procedure". Apparently the procedure does not cope with people who arrive to the room when they have finished asking around....

Anyway to cut a long story short, I found out that the improvement process is typically: when a customer complains and the number of complaints in a certain period becomes too high, the process can be changed to compensate for the increased number of incidents. Despite all good intentions, I believe there is just not enough room for common sense or for improvement suggestions triggered by the people who are in the process themselves.

Extending this, an analogy can be found in (IT) processes. Very often I encounter that "we all" work according to procedures and we can hardly accommodate for anything that is not covered in these procedures. Also in IT, common sense often simply not used as we have proven time after time that we cannot accommodate for this in our procedures. Apparently, thinking outside the current constraints is still hard to do. We do everything by the book and still the customer is not happy :(

Bengaluru

Today it's Sunday and I'm working in the hotel room but I'd like to share a few photos I took yesterday on my first day in India.

Ganesha, remover of obstacles, aka Godess of Luck

Lord Shiva, aka the destroyer of evil

and follower of Ganesha

Unfortunately only a few hours are available for sightseeing to not many pictures.

Saturday, March 6, 2010

Jetlag!

This morning at 9am local time I arrived safely in Bangalore, India. Funny to see yourself being filmed with an IR camera. My temperature was OK :). I cannot sleep in planes but I saw a couple of nice movies "Twilight" and "New Moon Saga" both were ok but had very sudden endings; the plot was not really properly ended and they were bad cliffhangers. By the time I arrived at the hotel around 10.30am, I had gotten over my tiredness and could start with a normal day with breakfast. At the time of writing this it's about 8pm in the evening and I guess it's more than time for another shower and get something to eat; I'm starving... Keep you posted...

Thursday, March 4, 2010

End of my Egypt Visit

So unfortunately my visit to Egypt has come to an end.

Roger @ Sphinx

I met nice and friendly people and we had a very productive week. It was way too short for the agenda and also for meeting these fine collegues. Now I will get a good night of sleep. Fortunately, some time was available to visit the pyramids and other fun stuff late after work. Oh and Egypt has nice food: Fool, Kosheri and Molockheya are great to eat. Tomorrow you should find me in the Egyptian Museum in Cairo and in the evening I will leave for Dubai and Bangalore. Shokran likum for taking care of me this week and hi to Ahmed, Ahmed, Hani, Heba, Islam, Wael and of course Hoda, Nadine and Mona!

- Tesba7 3ala 7'eer!

View from Al Azhar park on old Islamic Cairo - city of 1000 mosques

Sunday, February 28, 2010

Separation of concerns: Exception handling

This week (26-02-2010 ... 05-03-2010) I'm in Egypt for business.

While using a local ATM machine, I got an error message on screen similar to "Error xxx occurred while performing activity yyy". What was clearly visible from the descriptive text was that the activity I was performing, that activity yyy was actually a service on a very specific back-end. Even the back-end name was mentioned in the descriptive text. Also, from the error xxx, I could derive that this was a specific session management issue with the back-end itself.

Why is this wrong?

Well for a number of reasons but here are the main ones:

Security

- Security wise, this message exposed sensitive implementation details to the user, on-screen. Mentioning system names, nature of exceptions to a user etc. is a risk. Any mischievous user could easily have used this information to their own benefit. It is these kinds of mistakes which can easily be avoided. The user should not be bothered with this kind of information. The ATM user is simply not concerned. A simple "Service not available" or "Service temporarily unavailable" message should have sufficed. The user is not interested in this information. A system administrator for example, should be concerned.

- Similarly activity names expose functional context (expose implementation details on functional level) to the ATM user. Same issue: why would you give out such information to a user. Shield the "activity concerns" from the user by not telling them too much "Service temporarily not available" would suffice.

- The back-end exception was almost 1:1 delivered to the front-end. The builder had implemented specific back-end exception handling to be "pass-through" - meaning that any system which needs to react on these exceptions, would have to be concerned with the actual implementation of the back-end exceptions. Note that this last one is a 'guestimate' as obviously I cannot look into the system. But from interpreting the error message I have strong reason to believe so.

Separation of back-end exceptions from integration-level exceptions

What strikes me as odd that there are still many applications which would have to integrate with back-end systems one way or another, expose back-end details to consumers or even to user interfaces. Good practice is to shield domain specific exceptions (back-end) from the consumer of a service. This is commonly referred to as "Separation of concerns". The consumer of a SOA service does not need to know about the domain specific service because that would cause what Thomas Erl refers to as "contract-to-implementation" coupling, one of the four negative coupling types.

While we are at it, think about this one, It is not directly related to the issue at hand but it is a good practice anyhow:

Separation of technical exceptions from functional exceptions

What I encounter many times in my profession is that in the way exceptions are implemented, people often do not make a difference between technical and functional exceptions. This is related to the fact that technical issues (real exceptions) and functional concerns (not really exceptions) are easily intertwined - sometimes the conceptual difference between the two are ignored or not recognized leading to complex implementations.

Technical exceptions are defined for situations in the system which break the normal operation (ie. Service Not Available, Connection timeout, Timeout, Service down for maintenance etc). These happen typically when something is wrong with a system, the network, the database etc. As you can see these are significant failures in the system as they prevent the software system from working properly, that is, from executing the core service logic. Typically these problems require rigorous kind of resolving and are considered disruptive to the execution of the core service capability. If you encounter one of these, a retry mechanism may make sense; but not always.

Functional exceptions are not really exceptions. They typically happen while executing the core logic of a service, and are conditions which make sense to the core logic of the service. Note that in these situations, nothing is wrong with the system. Examples are Customer not found, Customer status is inactive etc. Business rules are typically based on these kinds of exceptions. If you encounter one of these then typically the consumer can anticipate that another successive call to this service will return the same result.

A strategy I tend to follow is that both technical status and functional status are mapped into typically two separate fields. TECH_STATUS and FUNC_STATUS.

Whenever a service capability is to be executed, the response to any consumer would always contain a technical status. As the tech status illustrates whether something technically went wrong it is important to ie. both request-response patterns, as well is fire-forget.
The functional status however, may not always be available especially ie. when a fire-forget pattern is executed.

Some examples are listed below.

request response:

tech status = service down for maintenance; func status = don't care/not available
tech status = ok; func status = ok
tech status = ok; func status = no data found

fire forget:

tech status = connection not established; func status = not available
tech status = ok; func status = not available

Obviously, a fire-forget pattern will not return a functional status as it will be executed offline from the consumer logic execution. Potentially a functional status may be returned asynchronously but this would not always be necessary. Sometimes it is sufficient to know that -eventually- the service core logic gets executed.

Hope you had fun reading; until next time...

Saturday, February 27, 2010

SOA Certified professional

Oops, I forgot to share this...
For some time now I'm officially a SOA Certified Professional :). For details on SOA certification, look at www.soaschool.com and www.soacp.com.
If you want decent, vendor agnostic training ;) I suggest to try one of these programmes. I am now working on the SOA Certified Architect programme. I was on the Rotterdam November 2009 class. Currently I'm preparing for the module 3 examination; wish me luck!!!

Wednesday, February 24, 2010

Published a SOA pattern

I just published a SOA pattern for review at SOAPatterns.org.

Please review it and give comments and feedback on the Process Orchestration Recomposition pattern. This would include suggestions for pattern name by the way.

Thank you very much; I really appreciate your input!