In the past 25 years, I have seen four things that made me step back and say, “This changes everything.” The first was the browser, (before that we got data from the Internet using newsgroups and anonymous FTP). The second was open source distribution, (we could get whole architectures up in hours, not weeks or months). The third was App Stores, (Amazon and Apple allowed us to distribute software with zero marginal cost). The most recent is the Lambda Architecture.
Yep, it is that big.
If you are a business owner or product manager who is into Big Data, data-driven decision-making, iterative A/B testing, machine learning-driven recommendation or any similar analytics application you have probably heard a passing reference to Lambda Architecture. However, anyone digging in deeper finds a menagerie of arcane terms that could only appeal to developers: Kafka, Storm, Spark, Cassandra, Elephant DB, Impala, Speed Layer, Batch Layer, Immutable Data Store, etc. Unfortunately, this can obscure the disruptive change the Lambda Architecture brings. As a result, many people with decision-making authority to fund technology changes are missing out on the impact Lambda Architecture brings to analytics.
Life in the traditional architecture world
Traditional architectures are based on transactions. They force collection of data into formats required to complete a given transaction (i.e., I need to collect N fields of information to process the sale of an item). Also, traditional architectures allow data to be changed: I can update my profile, update my shopping cart, update my order status, etc. If your objective is to complete a transaction, this works well.
However, what if you want to understand more about who buys what, who is doing what, or often more importantly what leads something to happen (or not happen)? You cannot get this from the transaction data but instead, have to perform “data archaeology” stitching multiple sources of data together to create what happened just before and after the transaction. If you are lucky, you have all this data. However, more often than not you need to engage in development efforts to collect more data at the time of a transaction, log more info, pull it into a data warehouse, change my reports, then dig in to see if you can figure things out. All these steps take a lot of time and effort, and each includes opportunities for errors.
Lambda flips how we view data on its head
The Lambda Architecture starts with an entirely different premise: that it is impossible to know right NOW all the FUTURE business uses and interpretations we will need from our data.
This is not just a platitude. It is an underlying philosophy that the value of data comes from the ability to ask it to answer as many questions for you that would every want to ask. This drives entirely different approaches to how data is captured, stored, interpreted—and most importantly of all—continuously reinterpreted as you learn and discover more about your company, customers, and operations:
- First data is preserved in its original form and never changed or destroyed. This lets you look at any piece of data at any point in time and factor in changes over time. For example, you could re-segment your customers every year, quarter, or even day as you learn new patterns.
- Second data is not forced into arbitrary formats (i.e., schemas) but is preserved raw as you may want to go back and glean different elements. For example, you could later realize a variable such as a source IP address of a customer visit to your site may entirely change how you measure, interpret and react to customers from this address
- Third data is engineered to allow it to be easily reinterpreted as you learn more. This does not just focus on making reinterpretation fast; it also makes reinterpretation fault-tolerant (i.e., easy to correct in the event of a bug—without any loss of information)
- Finally, it allows all of this in real-time with two points of view: a just-in-time view and the deep cross-sectional view (both of which are always current). This lets you make decisions quickly without sacrificing the 100% loss-less accuracy needed for important business areas (such as finance, medicine, or mission-critical operations).
Once you have these capabilities, the things you can do with data—quickly and at scale—are pretty amazing. I will share some of these in future posts, as I want to keep this post short.
However, I will close this post out with a simple analogy…
“Think Like a Chef” vs. the Fast Food Menu
Traditional architectures are like fast food menus. You have these options. If you want to change the menu, we can do some market research, see what works and roll out a new menu. If you want to change again (or explore “what if we had done this?”), we can repeat this process.
Lambda architecture is like the pantry of a great chef. You have all these ingredients. If you feel like duck à l’orange, we can make this. If you want a duck confit salad, we can re-purpose the ingredients. If you want rich potatoes, we can render the fat and cook the potatoes in it. And if you want vegan, we can pull other items from the pantry and make something else. There are endless options.
Mapping This Back to Things Business People Care About
So what does this mean for your business? Do you remember the last time you heard these comments:
- You’ll see that report. It will be in our Data Warehouse–tomorrow around 10 am
- Oh, that’s in our warehouse. We can build a program to convert and load the data into production. It will only take three weeks. Can you submit your TPS form to the Steering Committee so we can prioritize this?
- Unfortunately, we did not capture that data, but we can start to capture it now. In a few months, we can start analyzing it
With Lambda, all of these comments, and much more disappear. Data is never thrown away. It is always in production, ready to be used–for analysis or real-time transactions. There is no delay between transactional and analysis use; data flows down both paths at once.
Just imagine what problems you can solve without these limitations!
However, the Lambda Architecture is not ideal for everything. That is why we invented the Hybrid Lambda Architecture (patent-pending). Look for more on the reasons behind the Hybrid Lambda Architecture development in a later post.