December 15, 2004
In The Fallacy of the Data Layer, Rocky Lhotka makes a case for something I've come to believe as absolute truth:
It is commonly held as a truth that applications have a UI layer, a business layer and a data layer. In most of my presentations and writing I use a four layer model: UI, business, data access and data storage. In this case the "data storage" layer is really the same as the traditional data layer in a 3-layer model.
But I want to challenge this idea of a data layer. Over the past few months, in discussing service-orientation (SOA) as well as distributed object-oriented architecture, I have become increasingly convinced that the idea of a data tier, data layer or data storage layer is fundamentally flawed.
Note that in this article I use the word "layer" to describe logical separation between concepts regardless of physical configuration, while "tier" means a physical separation. That said, the reality is that a typical data layer really is also a data tier, because most data layers exist in the form of SQL Server, Oracle or some other server-based database engine.
Service-orientated design leads us to the idea that any software component used by more than one other software component (used by more than one "client") should be a service. A service is a powerful unit of reuse, providing a contractual interface by which clients can interact with the service.
More importantly, a service defines a trust boundary. This means that a service protects itself against invalid usage by clients. This is one of the defining elements of service-orientation and is a key benefit. The benefit is that a service can safely service many disparate clients because it trusts none of them. The service ensures that all clients get the same behaviors, and that any data sent to the service by any client is checked for validity based on the rules of the service.
My contention is that the traditional "data layer" is the ideal candidate to be a service.
In other words, your data layer should be a web service.* Based on the applications I've worked on, I agree completely. I touched on this briefly in my Who needs Stored Procs, Anyway? entry-- what I was really trying to say is that stored procedures are a terribly inadequate place to build a data API. Web Services, on the other hand, are ideal for building APIs:
- A web service is the only level of abstraction that really buys you anything. It's as close to the holy grail as I've ever been: complete platform and technology independence. You really could have Macintosh or Dos clients using your API. Behind the scenes, you could decide to replace database technologies entirely (or move to the all stored procs approach), or migrate to an entirely different operating system. What other API interface offers anything even remotely close to this?
- HTTP and SOAP are the cockroaches of platform technology. They'll probably outlive us both. If you're bothering to create a formal API, why not create one using a technology that has a chance of surviving more than five years? Yes, COM+, I'm looking at you.
- A web service forces you to keep your API methods simple and abstract. A good API is difficult to get right, and easy to overcomplicate. A web service interface minimizes this risk. KISS!
- A web service API can be developed and debugged independently of the UI and other layers. Other API technologies (stored procedures, binary DLLs, remoting, etc) make it a lot easier for undesired, accidental coupling to creep in, along with inappropriate opportunities to "optimize" for your particular implementation. They can also trap you in that interface: good luck passing objects to stored procedures, or getting remoting to work once you install .NET 2.0 on the server.
- Database performance is almost always the bottleneck anyway. Adding a web service to the mix doesn't cost you anything. An additional 20ms of latency is just going to be lost in the noise of the 200ms it takes the database to process your query. Behind the facade of a web service, the optimization choices are almost infinite, so this choice will likely make you more performant, not less!
I've already been burned by this on one large application I worked on. Over my protests, we implemented a binary .NET remoting protocol-- instead of a web service-- for communication between the smart client and the server. All this in the name of performance. The remoting works fine, but the fallout from this decision was painful:
- It's a giant pain to get developers set up on this project due to all the crazy server API dependencies they need on their machines. Less developers working on the project equals less getting done; it ends up being a barrier.
- Our API is far more complicated than it needs to be, and heavily tied to the client application. There's less visibility into the nooks and crannies of a remotable DLL than there is to a simple web page with a list of methods. On a recent code review, I found three methods that all did the same thing. All of them had completely different names, of course.
- It's difficult enough with our own developers; selling this API to other internal groups is an uphill battle. Just getting to "hello world" is far too much effort. I wish I could email them a link to a basic method to inspire confidence.
- We ended up writing a minimal web service to mediate some of the difficulty. Now we've committed the ultimate sin: we're repeating ourselves. Why not have a single well designed API rather than one crazy hard-to-deploy one, and one that's little more than a toy?
- Due to the inadequacy of remoting as a proper API abstraction layer, we ended up with a mish-mash of stored procedures, triggers, client-side rule enforcement, and raw SQL. And it will be incredibly painful to change any of it. Just writing about it is causing my arm to twitch uncontrollably.
With the inclusion of .NET runtime code support in upcoming versions of SQL Server, and even Oracle, you may be tempted to move more of your API into the database. But don't fall into that trap, either:
How does the presence of managed code execution in the database improve matters? Frankly, I don't see it helping much at all unless you're doing some of the things I mention above as a possible reason to consider porting from T-SQL. On the other side of the fence, tossing your entire business logic layer into the database makes many of the positive improvements you could make to scalability all the more challenging:
- You're going to the DB box more often
- You're chewing up more threads for greater periods of time.
- You're running more code that'll consume more resources. Just think about all the temporary objects and associated GC pressure that box is going to have to endure.
- You've got additional locking scenarios to worry about. A deadlock on a single app server is real trouble while in a distributed environment you're just taking one of N boxes down with the deadlock.
Give yourself options by choosing the right architecture early on. For most common business apps, I feel very strongly that a web service data API is the right architecture. And I have the scars to prove it.
* Yes, yes, SOA doesn't technically mean web service. But in practical terms, it does.
Posted by Jeff Atwood
Nicely put. I think the line between "tier" and "service" is collapsing in favor of services. I don't know if the "web" service (i.e. XML/HTTP/SOAP) is of primary importance, but I do agree that it's not something that should be outrightly rejected by architects.
Well, there are a number of ways to deal with that:
1) Don't return so many rows in the first place(chatty vs. chunky). LDAP, for example, has a "governor" where it refuses to return more than (x) results and basically forces the developer to Page the results rather than grabbing a zillion directory entries.
2) Explore HTTP compression. This works fine with web services, if the client is smart enough to enable it.
3) Use a custom, tighter datatype or alternate method, eg, .GetRowsCompressed()
Lots of options.
I don't understand. I found binary remoting to be pretty much as easy as WS.
I personally would use WS when you want to talk to more than one platform.. or when your data requirements are [and will be in the forseeable future] pretty tame. (Although you might reconsider using .NET for the service tier in this case... because MS.NET has a wonderful way of not working well with other SOAP toolkits (what's the point of SOAP again?))
There's just no getting around the fact that XML is a bloated exchange language.. and is just not feasible for a lot of applications. There's just no need for all that translative markup if you're talking to yourself[same platform].
That said... neither of our two favorite applications, IMO, warranted the need for remoting. WS would have worked just fine in those.
Good post, but I must disagree with the statement "web service data API _is_ the right architecture...". Statements like these are what get people in trouble in the first place -- there are no silver bullets in software development.
Web services, remoting, COM+, stored procedures and so on are fantastic solutions to certain problems and yet absolutely awful solutions to others. I think it's safe to say that most people developing software these days don't do a ton of research. They read things here and there, get some training and do their job. When they're told something is the way to go, they generally believe it and act on it. They're given a hammer and everything looks like a nail.
Web service data access is nice concept and a wonderful solution to some problems, but certainly not all. It's a great tool to add to the toolbox, but not quite good enough warrant emptying it. :)
It's a great tool to add to the toolbox, but not quite good enough warrant emptying it. :)
In the absence of specifics, it's always safer to err on the side of simplicity. And Web Services are both simple and flexible.
So what I'm saying is, you need a VERY good reason not to go this route. You could do a heck of a lot more damage by defaulting to something else.
I like the idea a lot. However, I'm still cringing at the idea of bringing down thousands of rows formatted as xml. This is going to add a lot more than 20ms. I still haven't found a good solution to this issue...
One clarification: the situation I'm describing is only for *apps that define a formal API.*
For straightforward "app talking to database" apps, with no chance anyone else other than that single app is going to need to get to that data, I wouldn't do this.
I believe that any DataLayer must be a simple code block, that they allow operations against DB.
That code block would not have to know on the Business Entities. Single to specialize it is to execute the operations (Store Procedures and SQL Sentences) against the engine DB (SQL, Oracle, DB2, etc.), with which this setting.
Finally, I invite to you to download the DataLayer.Primitives Public Version.
This is very cool Data Layer :)
DataLayer.Primitives - Readme!
Exactly what advantages does WS offer over a Data Access Layer developed through a simple Class Library ??
I was about to mention this book "Expert C# Business Objects" that I have been reading that presents an architecture that has a good implementation of what is being discussed, then I looked for a link to the book and realized Rocky (author of the article) wrote the book. So I guess I'll just recommend the book!