Thursday, November 5, 2009

How does SOA fit in today's and tomorrow's distributed architecture

This post describes briefly how SOA fits into current distributed computing architectures. Its purpose is not to give a full detailed positioning but is merely intended as a primer.

silo - integration - soa - grid - cloud

Traditionally, companies would implement "silo" based applications or application silos. By nature, these software stacks integrate poorly with other solutions because they are self-providing. Vendors selling these systems are not interested in opening up their systems as it allows their installed (functional) base to be reduced by other vendor's products with similar functionality. This makes it hard to make these systems work together with other systems. Mostly, the customers of these systems need to pay prime rates to have them opened up for their intended purposes, and the vendors will gladly charge another customer money to get the same or a similar interface implemented.

Making these systems work together is called "integration". Integrating systems is bespoke work per definition. It is considered expensive as it is custom work for every new integration. An integration effort is considered a tactical approach. Also, integration is based on a typical point-to-point approach which must be prevented to prevent IT burden from skyrocketing. The more interfaces are built between systems, the more chaotic are changes to be implemented and the more unpredictable the cost of change becomes.

This is where "SOA" comes into the picture. SOA tries to change this cluttered 'spaghetti architecture' into a more organized 'lasagna architecture' if you will. Why lasagna? Because a layered approach helps improving the predictability of your IT systems. Ever tried to take out a piece of spaghetti from your plate? You never know what comes with that small piece... With lasagna, you can take out a neat square (read: structured) piece and work from there. If you would read these neat pieces as services or service capabilities, it becomes apparent that the predictability of cost and impact of changes is increased, compared to spaghetti architectures. Yes, I hear you thinking, "I can do that with 'integration' as well" and you are probably right. The difference between integration and a service oriented architecture approach is mainly based on the approach SOA takes. SOA is based upon services with intrinsic interoperability which would be built to fulfill a specific business purpose. Furthermore they are intended to be reused/reconfigured aka (re)composed, and integration efforts are because of their custom efforts not easily or not at all reusable. To improve recomposability, SOA services should preferably not hold state, as holding state makes use of a SOA service limited to certain interaction patterns (their functionality depends on prior calls to the services or service compositions); without it, the intended business purpose cannot be fulfilled. Because a SOA service fairly easy (all in the eye of the beholder) can be used for other purposes in other service compositions, the SOA architecture approach is considered a strategical approach. For more SOA concepts I can recommend this book.

A problem of traditional SOA architectures is that they are really good at processing large volumes of small messages, but they lack the make-up for handling large messages. Many SOA architectures are based on a message based infrastructure. This means that messages are to be read into memory and streamed on the network many times in order to execute the business goals. These are by nature expensive resources as they eat away at network bandwidth, memory and potentially disk I/O. In a SOA "grid", services are allowed to have state and state is available (shared) across a certain infrastructure which reduces the number of times reading into memory and flushing from memory to network and/or disk. Fundamentally, a SOA grid is a special kind of SOA architecture which works around some of its challenges. It makes the SOA architecture more efficient with respect to large messages. Also a grid architecture has improved availability and fault tolerance because of its make-up.

A "cloud" architecture virtualizes resources like storage, memory, cpu, complete machine, java virtual machine etc. Basically, a cloud consists of containers with specific characteristics to meet a certain goal. For example you can have a database in the cloud, not knowing where it's hosted and you typically don't need to worry about it. It has a different cost model (ie. x cents per effective capacity unit per effective time unit) and you do not have to worry about hosting services. This also introduces other issues, like where is my customer data stored. Many countries legislation does not allow storing sensitive data outside the country, or in case of the EU, outside the EU. This poses different challenges. Advantage is however that you don't invest or plan a huge local SOA based infrastructure, but utilize (multiple) shared but isolated virtual infrastructure(s) to host your environment. You typically pay for what you actually use. And the environment is easily scalable as the resources are virtualized and would allow (on the fly) configuration changes (ie. assign more CPU or disk capacity). It is extremely simple to scale both vertically as well as horizontally. A step further in the cloud, is software, platform or application as a service, this is where provides will provide ie. a complete application environment as a service, resulting in a cloud of infrastructure, resources, platforms etc.

Another nice detail is that, if the concept of a SOA based approach did not exist, the paradigms and services required for a SOA grid and a cloud would be way less flexible than present. Potentially, the SOA grid and cloud would not even exist in it's current form.

------------------------------------------------------------------
Another (more traditional) application of grid computing / grid architecture (notice the difference in terminology: not "SOA Grid") is where a giant task is  split into smaller tasks and split across multiple systems which bite on smaller chunks of the huge data load. Very big companies would use supercomputers to do this before the idea of grid computing came along. The issue was that these computers would cost insane amounts of money and they could only be put to use on one task at a time if maximum system capacity should be available. In a grid computing environment, the computing units run on standalone computers which can be configured into groups of computing units each specializing in a different task. Because commodity hardware can be used instead of supercomputers, the cost per calculation will be significantly lower. Some initiatives have gone a step further at this and utilized public computers (individual user's computers) to increase the computing power to the greatest extent. Examples are the SETI initiative, or medical companies calculating the must effective treatment for a certain disease of virus. Funny detail is that complete public communities have formed competing to process the most parts of a giant problem - they have turned this into a competition much to the benefit of the companies putting up the calculations for work. If you would like to see such competition, view this score board.

No comments:

Post a Comment