img
Varun Sharma

Varun Sharma

Biography

Varun Sharma is a Enterprise Solutions Architect and he has several years of technology consulting experience in space of enterprise solutions. He is an IBM Certified SOA Solution Architect and IBM Certified SOA Analyst. Varun is an Integration Evangelist with experience in Fusion MW, SAG webMethods, Sterling Commerce, Informatica, ODI, IBM Data Stage, JMS, IBM MQ, EJB 3.0, Web Services (SOAP/REST), CORBA etc.

Varun has particularly good exposure to strategic consulting in the integration space. He is master in ESB deployment topologies and has executed geographically global ESB integration initiatives between separate continents. Varun has executed multiple business transformation initiatives and drafted EA roadmaps for multiple clients. He has built unified data tiers and implemented data virtualization initiatives, all on the principles of SOA and data services.

Varun has graduated in role & position – from tactician to strategist, from bricklayer to architect, from problem solver to agenda setter, from warrior to diplomat, from specialist to generalist and from supporting cast member to lead role.

Contributions

rss  subscribe to this author

Bookmarks



Big Data as a Service Published: June 11, 2014 • Service Technology Magazine Issue LXXXIV PDF

Abstract: The current times are deeply impacted by social, mobile and complementary technologies in multiple ways. This article is to showcase and highlighted some potential use case and user scenarios where B2C and B2B marketers and pioneers of IoT could use data from previously overlooked sources and pull information in great details to influence buying and possibly many other user behaviors, touching upon the technical aspects of the business, data and information.

This article is a recommended source of guidance for building a framework for the enterprises in the B2C and B2B space, helping them lay down tiered Big Data reference architecture for capitalizing on the power and potential of Big Data while keeping a sharp focus on security and governance. This article is a recommended read for architects and IT leaders who are on the verge of taking the Big Data plunge. There are so many things that you can re-do in the world of IT, but you just get limited chances, time and bandwidth to build good architecture; this article shall enable IT decision makers to build the Big Data architecture itself "right the first time."

This article is NOT an endorsement for any brand, company, enterprise, analysts and/or methodology. It's just an author's view on how one could re-use and apply simple concepts from his past experience to build a robust, expandable and flexible Big Data reference architecture and transform business using Big Data-as-a-Service.

Introduction

Mutual reinforcement and convergence of disruptive forces from underlying SMAC (i.e. social, mobile, analytics and cloud) platforms in the last couple of years has unmistakably given rise to more new business scenarios through innovation and realignment than ever before. The ever growing number of connected devices and machines is changing the IT and business landscapes alike. Educational institutes, healthcare, travel, aerospace, government agencies and other business houses are all discovering new and far-reaching ways to capitalize on the power of indispensable underlying platforms circumventing the internet of things.

In the automotive industry, the year 2014 so far has been a runaway success for Tesla Motors. Elon's electric motor company has disrupted the traditional automotive industry, and has spawned numerous new possibilities and venues. After 20-30 minutes in the showroom admiring the car, I could easily visualize a car in the near future with a super advanced onboard telematics device that can seamlessly send feeds to insurers about the owner's driving style, car's acceleration and braking pattern, distance travelled, routes taken, parking habits etc; enabling the insurer to build a custom pay-as-you-drive plan for all its customers, which could have a huge influence on the top-line for the insurers and more importantly better driving habits for the drivers. Insurers can then de-average pricing models, capture a greater share of the low-risk-driver market, cut the costs of managing claims and enhance the overall customer experience.

img

Figure 1 – Convergence of SMAC

If switching to higher gears in the same example and looking purely from the manufacturer's perspective, the same onboard telematics device could also send a data feed to the car manufacturers about the performance of all of the moving parts of the car along with the driver details data feed sent to insurers, helping manufacturers build better, more efficient cars and, more importantly, build more customizable car configurations for buyers of the future.

The level of details highlighted could also immensely help dealerships offer more exciting offers to prospective customers by helping to establish a detailed prospective customer profile, simply from a short 10 minute test drive based purely on the analytics captured during the drive combined with other data as needed.

Finally, it is quite reasonable to envision these advanced telematics devices enabling cars to talk to other cars on the freeways, helping to reduce traffic, accidents and auto theft. All this is leading us all to a future of self-driving and independent cars.

Evidently the common denominator in all of the above use-case scenarios is "the data." To put things into perspective, the data feeds that could potentially be flowing in multiple directions for the presented scenarios alone could be driver behavioral analytics data, car analytics data, service and parts data, third-party data, insurer's data, manufacturer's data, dealership data, etc. Like data sources and business scenarios, there is no end to possible opportunities in this space.

Big Data is the Real Deal

Apparently the amount of information generated from the dawn of time until 2003 – some five exabytes – is now created every two days. A large volume of unmanageable data is not a fad or fear, it is indeed the real deal.

To put things into perspective, consider how 50 years ago, everything was run on mainframes and data resided on those monolithic mainframe servers. This was challenged by the multi-tier architecture and the data was moved to business capability-specific enterprise applications, such as SAP, Oracle and IBM, but mediated by a plethora of middleware technologies and protocols. This new multi-tier architecture paved the way for "n:n" application relationships, leading to new challenges with regards to reporting and analytics and the overall integrity of data. EDW, BI and in-memory solutions like OBIEE, SAP HANA, and INFA promised to solve this for the multi-tier architecture, but could not live up to the high hopes and expectations of the customers.

This right now is the age of networking where it is no longer important what the underlying platform for hosting the data is or what MW tools we use to transfer the data. It's an age where data is the true king, not applications or platforms. Players like Salesforce and SugarCRM that have built an entire ecosystem in the cloud are forcing customers to move their attention away from the complexities of platforms, applications and maintenance towards data. We are already on the verge of moving away from "application based enterprises" to "data driven enterprises." As long as customers know what data it needs to draw conclusions and decisions to help grow the business, they are empowered by the cloud solutions to make informed and educated decisions.

The Big Data-as-a-Service reference architecture for the Big Data use case scenarios is a recommended framework to enable information availability to consumers via reports, discovery services, etc., through the reuse of data services, promoting best practices in data management using data modeling and metadata services.

Peel the onion

It is apparent that the model of achieving the convergence of SMAC is feasible best by deploying service oriented architecture, service federation, ESB and most importantly 'data services'.

SOA is an established and proven architecture approach for defining, linking and integrating reusable business services that have clear boundaries and are self-contained with their own functionalities. Within this type of architecture, you can orchestrate the business services in business processes.

img

Figure 2 – SOA Reference Architecture by IBM

The operational layer has all the packaged and custom applications hosted, along with software development kits and other infrastructural components. The service components layer represents the business component layer, which contains all the individual business components that are integrated to form a Web application. The services layer is the placeholder for all the services that represent the encapsulated and wrapped service/business components in the business process layer. The services in this layer are atomic and autonomous and often represented by Web services (WSDL and XML are the backbone of Web services), although the same services, when referenced in the business process layer they represent a complete business function. The services in the business process layer are choreographed to formulate desired business function and achieve the targeted result, mostly achieved by using BPEL. Now the formulated business process is exposed to the end customer in the consumer layer. The consumer action triggers the business process, which is built on referential integration on top of other layers and thus leads to the utilization of business components, using a Web service defined inside a business process [REF-1].

The SOA reference architecture here is built inside out, starting bottom up from the operational components and going to the top tier of consumers, all encompassed by QoS, security, and governance layers.

In the same vein, SOA data services are enterprise data management endpoints that expose highly optimized engines for working on all types of data. Data service as a concept predates SOA, dating back to when B2B ecommerce and EDI were gaining momentum in the business space in the early 1970s.

Technically, a data service could be a service at least exhibiting two or more of following attributes:

  • contract-based routing
  • declarative API
  • data encapsulation
  • data abstraction
  • service metadata

A close look at these five attributes shows the direct relation between data services and the basic pillars of SOA. However, we should never confuse SOA with the unified data tier [REF-2]. Furthermore, the information architecture tier in the view above can actually be expanded and logically laid down into the BI reference architecture.

Data Acquisition Tier

This is the true operational tier encompassing the disparate sources of data, including custom and packaged, and on-premise and cloud-based data sources for an enterprise. This tier may not necessarily represent the "sources of truth," but may represent the tier holding true physical data. The type of data residing under this tier may be sub-divided into three main categories:

  • unstructured data, e.g. HDFS, Docs, PowerPoint, Excels, graphics/images
  • semi-structured data, e.g. Exchange server, SharePoint, e-mails, TCP/IP packets, images
  • structured data, e.g. RDBMS

The data acquisition tier [Figure-3] is connected with data sources and the upper tiers via data services, primarily data transfer services and data processing services in combination with the data lifecycle management services.

Data Organization Tier

The data organization tier provides the means of data aggregation and data enrichment for the underlying data tier, exposing service data objects only as standards-based contractual services. This tier also encompasses the modeling services to ensure integrity of enterprise data in case of any changes in the custom or packaged data sources. Modeling and data encapsulation is made possible with the metadata services, covering the definitions and relationship between the various data objects.

The data organization [Figure-3] tier:

  • enhances IT flexibility by decoupling the data sources, while keeping them logically connected using metadata and modeling techniques
  • enables business flexibility by supporting the functional implementation of IT-enabled service data object services
  • provides a reference point of service realization and enforcement point for SLA and KPI management
img

Figure 3 – Indicative "Big Data-as-a-Service" Operational Reference Architecture

Data Federation and Data Virtualization Services

Data federation is an umbrella term for all the data related policies and rules that pertain to data in any which way within the federated organization and/or lines of business. The data federation tier will expose all the disparate data sources via the use of XML, Web services and/or other supporting technologies as a single source of data. Federation in essence is the abstraction of the physical data sources from the consumers, plus a layer to accommodate any composites and joins in terms of service data objects.

Data virtualization [Figure-3] is a complementary and supporting tier to data federation in terms of data abstraction, allowing consumers and consuming applications to retrieve data without requiring the physical, storage technology, API, formatting or technical details of the data. Data virtualization is a highly recommended tier for any data transformation and translation of data pulled from data sources to promote overall reuse and quality of data within the enterprise or line of business.

Combining these two layers forms a "single source of data" layer for consuming applications that can also become the "single point of failure" for the supporting organization.

Data Analysis Tier

The data analysis tier in this model is a combination of business analytics and business intelligence components together. The analytics and BI space is probably the most sought-after field these days, and ongoing focus on SMAC is just going to widen its circle of influence by pushing it towards more and more mainstream mediums in the future.

In my opinion, the data analysis tier [Figure-3] is probably the most dense and convoluted tier of all the tiers in Big Data Service Reference Architecture, predominantly owing to multiple micro-data services running all over this tier to help consumers achieve greater intelligence and time-sensitive advantage over the competition. The data mining services at the forefront of this tier allows for pattern matching, data association, data clustering and data anomaly detection in a fraction of a second right between the data consumption tier and the data virtualization services. The complementary science of predictive analytics, encompassing the forces of data mining with statistics, neural networks and use-case modeling, empowers the consumer tier with risk evaluation, decision-making and A/B evaluation capabilities, along with statistical data to support each. Just to put things into perspective: your FICO score is a prime example of how predictive analytics and data mining are applied on the data gathered from endless sources to build and maintain accurate credit scores that can be returned to the consumers on a need-to-know basis.

Data Consumption Tier

The data consumption tier is perhaps the most delicate tier, since this is where the data comes out as information to the end-user while there is no fixed end user. This tier [Figure-3] is strongly supported by discovery and query services,that run on standard integration protocols like JDBC/ODBC, REST, SOAP, adapters, APIs, file transfer, and MQ/AQ. These data discovery services could be realtime or otherwise, depending upon the consumer of the information/data.

The most critical component of this tier is the visualization service. Data visualization is part art and part science, making complex data from disparate systems easier to understand and process. Hence there is no right or wrong way of data reporting or visualization. The visualization component encapsulates the services, helping the end-user consumer data in the form of dashboards, interactive bar/pie charts, and line or scatter charts with realtime or near realtime capabilities built upon in-memory data or synchronous reporting services based upon alerts and triggers. There is no dearth of possibilities when it comes to building data visualization services.

Big Data-as-a-Service Reference Architecture Model

The quantifications of data based on the volume, velocity, complexity and variety of the data are primarily the most commonly discussed dimensions of Big Data circumventing information management. However, as we go deeper into Big Data, we are more than likely to discover that the conversion process "from data to information" traverses through multiple complexities and dimensions.

img

Figure 4 – Dimensions of Big Data [REF-3]

Other critical and consequential aspects associated with Big Data that we often overlook are the qualification of data and data access control [Figure-3]. These two dimensions touch upon the fundamental dimensions to the data, ensuring well-roundedness of information management. Getting data is just one part of the whole process followed by other equally, if not more so, significant parts of the whole Big Data life cycle.

  • linking data (associating data from various social and mobile sources in the cloud and on-premise, and drawing contextual analytics from the results)
  • sharing data (publishing the information to the right audiences in the right-sized packets and controlling the sharing and publishing via federated access control mechanisms)
  • archiving data (removing stale, orphaned and invalid data from realtime data for smarter intelligence)

To the untrained eye, 10+ exabytes of data every week surely fits into what one might call a "data problem," and that's where the Big Data reference architecture comes to the rescue. The convergence of SMAC focuses deeply on the "data." However, we should not fail to consider the important underlying principles driving anything and everything data-related, in other words, security and governance. Encapsulating the Big Data-as-a-Service's operational tiers with a tier for security services and finally encapsulating it all with a governance services tier is pivotal in ensuring that the data is not only preserved and value is drawn from it, but also ensuring that access to the data is based on a "need-to-know" basis, and the QoS parameters set around all the data and data services so anything can work in auto-pilot mode [Figure-5].

img

Figure 5 – consolidated "Big Data-as-a-Service" Reference Architecture

The security tier should be used as a placeholder for the AuthN and AuthZ services, defining and executing data encryption and laying out various pre-defined solutions for possible user scenarios and function points. Data access services can be triggered on the business rules around RBAC/ABAC, ensuring access control around the data.

The quantifiable properties of data volume, velocity and variety, pose a threat in potentially sabotaging the enterprise strategy around data if data governance is not considered at the forefront. Remember that there is little to no value in "bad data." In fact, bad data is nothing more than a liability that just keeps multiplying on its own when ignored. Data quality services enable a data steward to maintain the quality of the enterprise data at a pre-defined level suitable for business usage. Data cleansing and matching services are critical for ensuring that data de-duplication and enrichment are performed before the business users or consumers get to see the reports based on this data. Data standards services are tedious to build, but offer quick returns in quality of data, extremely high reuse and consistency once deployed. Data standards services could also be used in conjunction with data security standards service, for federation and consistency.

Summary and Key Takeaways

To summarize the whole value proposition of Big Data-as-a-Service, let's revisit the original premise we began with for this reference architecture model/approach:

"In the automotive industry, the year 2014 so far has been a runaway success for Tesla Motors. Elon's electric motor company has disrupted the traditional automotive industry, and has spawned numerous new possibilities and venues. After 20-30 minutes in the showroom admiring the car, I could easily visualize a car in the near future with a super advanced onboard telematics device that can seamlessly send feeds to insurers about the owner's driving style, car's acceleration and braking pattern, distance travelled, routes taken, parking habits etc; enabling the insurer to build a custom pay-as-you-drive plan for all its customers, which could have a huge influence on the top-line for the insurers and more importantly better driving habits for the drivers. Insurers can then de-average pricing models, capture a greater share of the low-risk-driver market, cut the costs of managing claims and enhance the overall customer experience."

As billions of drivers drive cars with millions of moving parts around the world, the volume, variety and complexity of the data cannot be captured effectively in a tightly coupled environment that focuses only on a single aspect of the data. This example shows how in a simple use case of a "car-driver-insurer" relationship, outdated or bad data could cost money and reputation to the manufacturer, insurance company and/or even the driver. Also, loss or breach of data could lead to bigger, more daunting issues for the stakeholders involved.

The key takeaways for a successful implementation of Big Data-as-a-Service are:

  • Data governance is a must-have, and no longer merely a good-to-have.
  • Ignoring data security, data quality and data access can cost organizations millions of dollars, hurting the enterprise agility, efficiency and reputation.
  • Break the operational tiers for data flow into logical groups, i.e. consumption tier, analysis tier, organization tier and acquisition tier, to allow agility via loose coupling and abstraction.
  • Don't focus solely on the volume, variety and complexity of data. Consider the whole cycle from the acquisition of data to the extraction of information, and consider the hygiene factors along this path.

Successful Big Data-as-a-Service implementation would require close collaboration between Enterprise Architects, Data Architects, Database admin, BI and DW SMEs, SOA experts, InfoSec representatives and business strategists.

Finally, think globally and act locally.

References and Appendix

[REF-1] – SOA for Dummies: Introduction to SOA http://theobservinganalyst.blogspot.com/2010/12/soa-for-dummies-introduction-to-soa.html

REF-2 – Enterprise Data Architecture, SOA and Data Services http://thequintessentialinquisitor.blogspot.com/2012/08/enterprise-data-architecture-soa-and.html

REF-3 – Gartner Research – April 2011 'Big Data' Is Only the Beginning of Extreme Information Management

Other References

http://en.wikipedia.org/wiki/Unstructured_data

http://en.wikipedia.org/wiki/Data_mining

http://en.wikipedia.org/wiki/Data_virtualization