> Archive > Issue XLI: July 2010 > Fluid Services
Ilkay Benian

Ikay Benian


Ilkay Benian, a Principal Consultant and Technical Architect at Netsoft USA Inc, holds a BS degree on Electrical and Electronics Engineering and has over 14 years of industry experience on computer software systems.

Ilkay has worked in large scale ICR (Intelligent Character Recognition) and forms processing projects like 1997 Turkish Census Project for Statistics Institute of State, and Tax Forms Processing Project for the Ministry of Finance of Turkey. He undertook mission critical analysis, development and integration tasks. He led R&D team at Link Bilgisayar, a well known ISV in Turkey and is the author of the first Turkish text-to-speech synthesizer.

Ilkay joined Netsoft USA in 2003 and took various positions from senior software developer to technical architect. He has architected Netsoft USA Framework and developed the core components of it creating a valuable asset that contributes to the competitive strength of the company. He also developed the SOCF (Simple Object Collaboration Framework,

He is still working as a principal consultant contributing to multiple simultaneous projects at Netsoft USA and specializes on the application of OO, CBD, SOA methodologies, frameworks, DSLs, code generators primarily on .NET platform. He also has experience on MQ and XMS messaging technologies to integrate with IBM WebSphere platform.


rss  subscribe to this author


Fluid Services

Published: July 11, 2010 • SOA Magazine Issue XLI


As organizations continually build their software integration architecture based on the SOA paradigm, more and more services are being developed and reused to build other services. Just as OOD and CBD paradigms introduced code reuse in applications and component reuse across applications, SOA has brought the advantage of enabling reuse across distributed applications and platforms with flexibility and agility.

However, as systematic reuse of such services become more and more widespread, performance is becoming a real concern; Latencies introduced at each back-end call are accumulated, large units of work hinder utilization of parallelism, chained service calls cause large amounts of wasted resources deteriorating scalability. SOA has to address these problems to advance to the next level of maturity. This article analyzes some of the important bottlenecks and proposes a new approach for rethinking and redesigning existing services to use a stream-oriented rather than message-oriented communication in order to make them more responsive which will in turn encourage more service reuse, increase composability and provide better development agility without the performance concerns

Revisiting the Goal of SOA (Service-oriented Architecture)

SOA has emerged with the prospects of solving mainly the agile software integration needs of growing organizations. SOA borrows the contract based approach in component software and applies similar principles to enable independently developed systems communicate. Once contracts are established and agreed upon, it is possible to develop systems in parallel, integrate them easily and evolve them in time without having to be always in lock-step. It also ensures that future consumers or even future providers of data can continue to deliver the services using the same contract without breaking other parties in the conversation. Thus, SOA manages the asynchrony between naturally independently developed systems and their emergent complexity. To achieve this goal, SOA’s strategy is to lay down the guidelines for establishing the necessary agreement or contractual platform to allow systems that co-evolve over time even if they are not in sync.

This strategy is actually just a scaling up of a principle in software engineering that has been addressed by various methodologies; managing complexity. By acknowledging the inherent characteristics of such distributed development, their asynchrony, and their independence, SOA was able to address the issue very wisely. SOA’s success can be observed in the fact that more and more systems are written as services and built on top of other services. Large scale distributed computing has been thus rendered accessible and complexity that plagued old monolithic systems has been reduced to a manageable level.

Problems of SOA

On the flip side, chaining service calls to do reuse of existing services leads to performance problems that must be addressed. And those problems are unfortunately not directly addressed by SOA. If you think about it, none of the four tenets of SOA suggest a solution. So let’s try to analyze the performance related problems that organizations face especially at a stage where SOA is just about to take off:


Services are usually designed to handle a known or estimated amount of simultaneous clients. However, SOA’s goal is to make services reusable which means there will necessarily be more and more clients in time, either because of the increase in the number of end-users or because of the increase in the number of back-end service calls caused by chaining. When new services are built to use the existing services, existing services quickly start failing to meet the originally intended and promised SLAs. Throughput reduces, latencies and downtime increases.


Chaining services to others or putting intermediaries that process messages only adds to the latency. For example, if a service-oriented architecture includes an orchestration layer, responsiveness of services becomes a high priority. Latency is the worst form of performance bottleneck that is hard to address with the currently used methods. Intermediary hops fold the latency caused by large data transfers. Simply adding more servers do not solve the latency problem, because each call is an atomic unit of work which is usually not designed to be parallelizable. In an era where parallel computing is resurrected, SOA does not provide the necessary level of guidance for future services to be developed to meet increasing load and data size.

Composability / Reuse

A direct consequence of cumulative latency is that; services exhibit a bad composability and reuse characteristics. When new services are designed, reuse of existing services is avoided because of performance concerns. This means, services are not as valuable assets of an organization as they should be. This issue is actually more apparent in organizations that have the sufficient level of maturity to start systematic service reuse. Reimplementing new services each time new integration needs arise means there is essentially no reuse at service level. It also leads to direct app to app integration which somewhat defeats the purpose of ‘service-orientation’ because in such a world applications take the priority over services; the exact opposite of what is preached by the SOA methodology.

Parallel Computing

Since SOA style services tend to have coarse grained contracts, more callers mean more data transfer and more processing power and cannot be easily parallelized. The end of Moore’s Law as we know it is shifting the ever increasing performance demands to multi-core hardware architectures. Although, new languages emerge to simplify writing parallel code, it is still not clear how this effort will be able address SOA specific concerns.

Current Efforts

There are several approaches to solve the above problems of today’s SOA based systems. Some of the currently used solutions are: load balancing, caching, queuing, asynchronous execution, parallel service calls, dynamic scaling, Comet, HTML5 web sockets, HTTP streaming. None of these solutions are sufficient for every possible scenario and usually used in combination with others.

There are also other innovative approaches to bring parallelism to service development, like the Software Pipelines methodology or Software Pipelines Optimization Cycle (SPOC). The idea is to increase throughput by distributing the embarrassingly parallel business logic to multiple threads or servers while also keeping the order of processing where necessary. This approach requires a sufficient workload to be available in the input to make use of such parallel power. It also doesn’t specifically solve the latency problem that was mentioned above especially where multiple service calls are chained due to service reuse.

Another Possible Solution

In this article, I will try to propose a solution which will solve primarily the latency and throughput issues in the context of growing service reuse, data size, and workload.

It may be possible to address both the throughput and latency issues by rethinking the way services are designed in the following ways:

  • Use the inherent parallelism between clients and servers to overlap processing of messages.
  • Redesign the service contracts to be more explicit about multiplicity of elements in the input or output and allow consumption of partial data.
  • Use streamed transfer of service request/response data, instead of buffered transfer.
  • Write services to produce data as it becomes available or as it is computed, rather than waiting till the end of processing.
  • Write clients to start consuming data as it is received from the server, rather than waiting for the whole message to complete.
  • Allow and encourage clients to cancel ongoing operations as soon as possible, thus releasing server resources earlier.
  • Allow and encourage chaining of such service calls to reuse existing assets without performance concerns.
  • Allow and encourage the service code to parallelize the processing of incoming data elements.

Use of Inherent Parallelism Between Clients and Servers

As obvious an opportunity as it sounds, the inherent parallelism between clients and servers are not taken into account in developing business services. A client has to wait for a service to complete its response before starting to use it. This may sound a natural way to go for business applications. But in reality it is just wasted time that adds to latency at every node. Service reuse makes the problem even more dramatic. Each service that is reused means additional wait time for the end user. The reason why such an obvious issue has not yet been the number one killer of SOA is because services have mostly been called by their direct consumers, or up to two levels of call chaining despite the latency penalty. Unfortunately, more and more people that go into SOA develop services with the hopes of reusing them in the future and end up hitting the brick wall. SOA infrastructure of organizations will and do inevitably have to deal with this problem sooner or later.

Fortunately, SOA developers aren’t the only ones who have the latency problem. Media streaming, online gaming, and other real-time systems developers already know the problem and practically solved it. The trick is to actually make use of the fact that client and server can run in parallel and process partially transferred data. Simply let the client start using the results as soon as the server starts generating them. This technique makes it possible to start watching a YouTube video almost as quickly as you hit the play button. If YouTube worked like today’s web services, you would have to wait for minutes to download a video that you will probably not even watch till the end. Just think about what the implications would be:

  • You would waste your time just waiting for the download.
  • The part that is already downloaded on your computer would be idle and take space in the memory or on your hard disk.
  • You would waste your network bandwidth.
  • You would waste server resources.
  • People would be able to watch less number of videos but consume more system resources.

Instead, we get all the benefits by simply changing the design to use an asynchronous mode of operation. The client and the server, and every other device in between are already running in parallel. So why not make use of it before we start parallelizing our business code on the server side? It sounds like we are really missing something here. Doesn’t it?

Reality is a bit more complicated than this picture. Our serial way of designing and coding business application code just prevents services to behave this way. A common pattern in SOA is the request/reply pattern. It makes designing and implementing services very simple. It does not dictate what the request and reply should be. This is because the server is assumed to only accept fully formed request data, and the client is assumed to only accept fully formed response data. Building security, logging, auditing, routing, and many other facilities based on this idea is pretty straightforward. It’s just document processing. Problem is, when document size gets big, the operations required to generate the document grow, or the steps that work on such documents are chained, the idea of passing full documents around starts to crumble.

If you think about the nature of many business services passing full documents around actually is not that necessary at all. Many services just pass some query criteria and get a list of elements. If only the response data elements could be pushed by the server as they are generated and clients could consume them as they receive those elements, the perceived latency would be reduced dramatically. Just as in media streaming, business data elements would behave like a stream that flows back to the client and the client wouldn’t have to wait for the whole service data transfer to be complete. Notice that this is different than just a client side asynchronous service call because it doesn’t just run the service call in a background thread and notify when the full response is available. It actually let’s you start consuming the results even if the whole response is not received, and perhaps even if the server side processing is not complete. This idea actually has other implied benefits that are not obvious at first sight:

  • Clients get quick response and start consuming the results almost at the same time as service processing starts.
  • Clients can cancel an operation in the middle, and let server resources free earlier.
  • Server resources are used only as requested, rather than always requiring large atomic transactions at every single call.
  • If back-end services called also behave the same way, latencies are increased by only the processing time for a single element, not for the whole data.
  • Multiple levels of service reuse become feasible as latencies are affected minimally.
  • When back-end transactions are initiated based on a decision or previous data, cancelling an operation saves a lot of back-end resources too. (Of course this is just an idealization. In reality there is usually a transfer buffer that introduces some latency which is still much less than the full response time of a service).
  • Since response time is low, and cancellation is possible, it becomes feasible to run business logic queries that rely on back-end calls.
  • Since processing time for many calls become shorter, throughput may increase proportionally.
  • If the same async design is used for request data (list of input elements), then the service could start processing as soon as it receives the first elements from the client.
  • Incoming stream of elements are much more easily parallelizable using something like the software pipeline approach.
  • Even when there seems to be not sufficient requests to justify a software pipeline approach, splitting data into elements or smaller units of work creates an opportunity for better parallelism, thus feeding a pipeline better.

This list can go on and on as we start thinking about the possibilities that are opened up. What about disadvantages?

  • When a client’s consumption rate (not just data reception) is slower than the service, the service connection will have to be kept open until the client decides to give up. The usual web services do not leave this kind of decision to the clients, which make them immune to this issue. But for such streaming services, it’s a weakness that is easy to abuse. Setting reasonable timeouts and consumption rate requirements for the service could provide some defense against such abuse. Remember that buffered services could also be abused in different ways.
  • If the service is connecting to a database and implementing a similar asynchronous processing pattern, then the database connection will also be kept open until the client(s) are done processing. This may reduce database scalability. To address this issue, database access could be executed in the background within the service, buffered in the memory and but pushed to the client as results become available.
  • Today’s message based web services security work on the full request/reply document. With streaming services, this is not possible due to the nature of partial data that flows through each node of processing. A new approach is needed. One solution is to create chunks of data that are separately signed/encrypted. Another solution could be just signing the header of a stream of elements and encrypt the rest using a session key passed in the header.
  • Intermediaries that do not acknowledge the nature of such streaming services could cause buffering of data and kill the advantage altogether. All intermediaries have to be also designed to allow flow of partial data rather than relying on full message content.

The Role of Call Cancellation

Why is service call cancellation important? Real reusable services do not know their ultimate consumers and cannot even make assumptions about their consumption patterns. For example, a service could be requesting some data, but using a significantly small portion of it. The obvious and efficient solution to this issue is to filter and return only a restricted amount of data to the client rather than returning an unlimited number of results and then let the client decide when to cancel the call. Question is how realistic is it to assume that we can modify services each time there is a new client. Is it even compatible with the SOA mindset? Another solution is to create more granular services and let the clients decide how to compose them. This is the approach of OOD, but service calls will never be as cheap as object method calls and hence have a bad impact on scalability. In some sense, this is why we treat services specially rather than just remote object calls, isn’t it?

Therefore, cancellation is actually a good thing to have for this kind of asynchronous streaming services, so that we can at least give clients some control over how much of the processing should be done and how much of the results should be pulled for consumption. Here is a list of reasons to have cancellation for streaming services:

  • Existing services may not have the filtering capability.
  • Filtering capability on the server side may not be feasible to implement.
  • Some filtering cannot be implemented on the server side due to client specific knowledge. Since the business logic tends to become more and more distributed, this problem becomes more common.
  • When new services are built on top of existing ones, it may not be a safe practice to change the existing service behavior. This would also mean that, each time something new is built, the older services have to know about the new concerns. In small or closely working teams this might not be an issue, but in large organizations this is definitely a major concern.
  • As the level of reuse (call depth) increases, the abovementioned issue becomes even more dramatic. You cannot write a service that knows and satisfies the concerns of all types of direct or indirect clients.
  • End users that initiate calls to services may actually prefer cancelling, or reinvoking the service. Imagine what it would be like if you could not stop a download that you just started. This problem will be more and more common as different types of clients (mobile as well as desktop) become widespread. The responsiveness on mobile devices is a top priority and no one would be patient enough to waste time waiting for a long service call to finish execution.
  • Rules engine type of clients could invoke such services more aggressively, consume data as much as they want and disconnect. Since rules are flexible and configurable at the rules engine level, it is not feasible to expect back-end services to be optimized for the new concerns each time new rules are added or existing rules are modified. Sometimes, this might even require creation of new services because a single method call is no longer sufficient to implement all the rules at hand. It is much more straightforward and more maintainable to implement rules that rely on existing services. If services can provide a stream of data that is partially usable and also cancellable, introduction of such rules on top of existing services will become feasible with the least possible impact on performance.

What are Fluid Services?

The above described idea can be understood as a natural extension of the concept of iterators in OO programming languages. An iterator provides sequential access to a list of objects without exposing the internal details of how the objects are generated or where they come from. An iterator could just spit out elements of an existing array or list object. But here’s where it really shines: When an iterator spits out objects as they are computed without actually using an underlying storage, a whole new world of possibilities open up. Now you can independently write code that produces datasets, and code that consumes them without having to store any intermediary data or even without storing all of the data. You can chain functions that take iterators to create even more complex processing that operate on a set rather than single objects. What’s more, you can even run them in parallel and overlap operations that work on different elements. This is similar to a multistage pipeline in hardware engineering. As function A (stage A) is operating on element N, the next stage B (function B) could be operating on element N-1 at the same time. Multiple such steps/stages (functions) could be chained in this way without folding the response time. When new steps are added, the total latency is increased by only the amount of processing time of a single element for additional steps, not the additional total time for all the elements for all the steps.

Imagine a similar implementation for a web service using the above mentioned asynchronous producer-consumer model. The response of a service could behave like a remote iterator on a computed result set. A client that is inherently parallel to the server just starts iterating on the asynchronously produced elements as they become available. The client could be smart enough to wait for the server if results are not available yet, and notify the server when it no longer wants to continue processing. A metaphorical analogy, although more ambitious is the flow of fluid in a pipe. The request and/or response of such services are pretty much similar to fluid flowing into and out of a pipe. A pipe could be split on the server side, joined again, and connected to other pipes on other servers and so on. The good thing about this architecture is, client decides when to flow data into the pipe, and when to stop it, and when it decides to do it will see a very quick response at the outflow. Also, this architecture is really compatible and supportive of the software pipelines approach, although this is more focused on the latency aspect. So I think, the ‘fluid services’ metaphor is appropriate enough to position the idea and the intent of it. Another similar paradigm is ‘stream processing’ which is devised to exploit parallelism on a stream of data elements, usually primitive data types like floating point or integer. The idea is similar, but it’s mostly about low level programming concerns within the restrictions of a given target hardware (e.g. GPUs, DSPs). Applying those ideas to business services development requires a different point of view although the approach is essentially similar. To distinguish the difference, a new name would be more appropriate. So I prefer to use the term ‘fluid services’.

Evolution and extension of standards, technology and tools has enabled this to happen seamlessly with the existing technology infrastructure. The promise and popularity of Web services and the standards associated has led to define and adopt such standards in the commercial realm. Standards like WSDL, BPEL, UDDI and the like are well established. With emerging needs and technology to satisfy these needs, there is a requirement to make a standards framework that is encompassing and expressive. Such a framework can provide more detailed specifications of service and its relationships across the Web service lifecycle. This framework combined with the right set of tools/technologies provides the following benefits:

How to Design Fluid Services?


A fluid service contract should be designed in such a way to allow transfer of multiple atomic data elements rather than a single atomic data element.

A request/reply style service’s contract normally contains operations like:

  • DataResult GetData(DataRequest request)

fluid service contract should on the other hand look something like:

  • IEnumerable GetData(DataRequest request)

This makes it clear and known to clients and all other intermediaries that the service is able to generate an open ended stream of data elements.

The same approach could also be used to design the request data contract:

  • SubmitData(IEnumerable dataElements)

A complete fluid service operation contract would look something like:

  • IEnumerable ProcessData(IEnumerable dataElements)

Just by changing the request/reply pattern into a requests/replies pattern, it becomes possible to overlap production/processing/transfer/consumption of partial data, reducing the perceived response time and potentially terminate processing prematurely to save precious resources.

The size of data elements must be granular enough to keep elemental latencies low. This would make sure that adding intermediary processing stages would not increase the latency significantly.


A fluid service should use streamed transfer mode rather than buffered. In reality even streamed mode uses a transfer buffer to send the data through the wire. However, buffered transfer usually refers to a buffer that holds the full message document rather than a portion of it. The streamed transfer mode makes it possible to start transferring the data as it is written out. The same streamed transfer mode should be also set in the client so that it will start consuming the response as it is received. In reality there will always be some latency caused by the physical network and transfer buffer. But this latency is way shorter than a message that takes seconds to be received in full. This makes it possible for a fluid service responds almost immediately even when services are chained to call one another.

Applying Fluid Services in Real World SOA Scenario

Figure 1 shows a real world service that aggregates data from multiple back end sources, some of which are direct data sources, some existing services, and some legacy systems. Such a system incurs high latency costs and sometimes become unacceptably unresponsive. The users of such an aggregation service may not know what back-end services are involved but they will know the slow experience for sure.

Figure 1

This is already a problematic case, but what if we want to actually reuse the aggregation service in another service? This could be another aggregation or orchestration service. The red box will automatically become a hot node and may easily cause congestion. But even when there is not much traffic, the latency problem will still be there and will resist all attempts to fix it. Simply put, this service is practically not really reusable anymore, because if you reuse it no one will use it.

The sequence diagram in Figure 2 shows the dramatic impact of accumulating latencies due to careless sequential design.

Figure 2

Server triggers all back end transactions one by one without utilizing any kind of parallelism at all. The client has to wait for all of the serially executing steps to complete. The client also has to wait for the full message to be transferred before starting to process it. This includes serialization, buffering, network transfer time and deserialization of the entire response.

This design is unfortunately what most people are gravitated towards because of its simplicity. Considering all the effort and investment that goes into enabling better and more manageable parallel processing for multi-core and distributed systems development, isn’t it ironic to keep designing systems that are bogged down by sequential logic for the simplest and most obvious parallelism opportunity?

One possible and obvious solution is to start back end transactions almost at the same time to run them simultaneously as illustrated in Figure 3, wait for all the transactions to complete and then respond to the client. This design actually reduces the response time to a great extent and is probably already used by some systems if not all.

Figure 3

However, this approach still has several deficiencies:

  • The server eagerly triggers all back-end transactions to give a quick response.
  • Client has to wait for the longest running transaction and network transfer time.
  • Client has to wait for the entire message to be processed, the full response to be generated and received before starting to use it.

Now let’s look at how the proposed ‘fluid service’ design utilizing the parallelism of all servers producing and consuming data in parallel at all layers would be like the one shown in Figure 4.

Figure 4

Our service first hits the database, and just returns the results as they are received from the database. So the perceived response time from the client’s perspective is incredibly short. The service will be executing as the client processes it simultaneously. At one point the db transaction is finished and the back-end service is called. The client will almost immediately start getting the results from the service as soon as the service hits the back-end service. Client can cancel at any time, which could be earlier than some of the service calls. This means, unused calls are never actually done. Of course, the service could behave more eagerly and process the data before client actually consumes it. This would improve the response time for the services even more, and would not hold on to back-end resources for a long time. However, that would also mean that other incoming requests will have to compete for resources for their own eager utilization. Therefore, triggering back-end transactions only when the actual consumption occurs will probably be much better for scalability. On the other hand, when a database transaction is triggered, it is much better to consume all the query results as soon as possible rather than waiting for client to catch up and request more results.

One of the most important benefits of such fluid service design is its great reuse/composability characteristics thanks to the minimal impact on perceived response time when calls are chained. Immunity to latency barrier creates new opportunities and shifts the mindset to embrace the reuse culture just like in OOD.


The second part to this article will be published in the August 2010 issue of The SOA Magazine.

Read more about Fluid Services at An Experiment on Fluid Services for Highly Responsive, Scalable and Reusable SOA