April 16, 2007
My previous entry, Twitter: Service vs. Platform, was widely misunderstood. I suppose I only have myself to blame, so I'll try to clarify with another example.
Consider Reddit. The Reddit development team switched from Lisp to Python late in 2005:
If Lisp is so great, why did we stop using it? One of the biggest issues was the lack of widely used and tested libraries. Sure, there is a CL library for basically any task, but there is rarely more than one, and often the libraries are not widely used or well documented. Since we're building a site largely by standing on the shoulders of others, this made things a little tougher. There just aren't as many shoulders on which to stand.
On that note, if you have been considering writing a web application in Lisp, go for it. It will be tough if you're not already a Lisper, but you will learn a lot along the way, and it will be worth it I am sure. Lisp is especially great for projects where the end goal is unknown because it's so easy to steer in different directions. Lisp will never get in your way, although sometimes the environment will.
Language performance is a red herring. That's especially true when we're comparing dynamic languages like Ruby, Lisp, and Python that will never be known for their high octane, nitro burnin' performance levels. I assumed Alex Payne knew that when he chose to specifically call out Ruby language performance, but maybe I assumed wrong.
When you choose a language, like it or not, you've chosen a platform. And as Steve so patiently and calmly explained to all the Lisp enthusiasts, the platform around the language, more than the language itself, sets the tone for your development experience. The availability of common, popular libraries and the maturity of the development environment end up trumping any particular significance the language holds.
That's why the Reddit switch makes good business sense: they didn't change languages; they changed platforms. At the point which your choice of platform starts to jeopardize your service, you switch platforms, exactly as Reddit did. Your users don't give a damn what framework and language you're using. The only people who care about that stuff are other software developers. And God help you if your users are software developers; then you're really in trouble.
But things aren't all roses in Python-land either. The Reddit developers initially used a Rails-like web application framework, with decidedly mixed results:
The framework that seems most promising is Django and indeed the authors of reddit initially attempted to rewrite their site in it. I was curious about their experience, so I carefully followed them along, trying to help them out.
Django seemed great from the outside: a nice-looking website, intelligent and talented developers, and a seeming surplus of nice features. The developers and community are extremely helpful and responsive to patches and suggestions. And all the right goals are espoused in their philosophy documents and FAQs. Unfortunately, however, they seem completely incapable of living up to them.
While Django claims that it's "loosely coupled", using it pretty much requires fitting your code into Django's worldview. Django insists on executing your code itself, either through its command-line utility or a specialized server handler called with the appropriate environment variables and Python path. When you start a project, by default Django creates folders nested four levels deep for your code and while you can move around some files, I had trouble figuring out which ones and how.
Django's philosophy says "Explicit is better than implicit", but Django has all sorts of magic. Database models you create in one file magically appear someplace else deep inside the Django module with a different name. When your model function is called, new things have been added to its variable-space and old ones removed. (I'm told they're currently working on fixing both of these, though.)
Note that any analogies I'm drawing between Rails and Django here are purely intentional.
Not that there's anything wrong with adopting a web application framework. But at least in Python you have a choice of web application frameworks. Instead of investing in the Django worldview, the Reddit team decided that the lighter weight web.py better suited their needs. Similarly, some ASP.NET developers reject the entire page lifecycle model, preferring to write their own HttpHandlers and HttpModules for finer-grained control over what's happening on their website. And that's fine; the ASP.NET platform accommodates both camps of developers.
It's true that Twitter represents an extreme case, but it sure looks like the Twitter developers could benefit from a choice of web application frameworks, too. In the end, it's about choice and flexibility. Not just in the language, but in the platform that inevitably comes along with any language.
Posted by Jeff Atwood
Interestingly in ruby you do have a choice of frameworks. Personally, I use camping for fun projects, merb for fast responders, and there is talk of a few new microframeworks like Rack (which works inside other frameworks).
A camping app I just wrote was http://twitter.caboo.se which grabs json from the twitter API and parses it to show you shared and unshared friends from a group of users.
Merb is a fast mongrel responder that acts like rails and can use many of the rails components but with the mutexes much tighter. Typically we use it for grabbing stats or handling uploads.
Rack is some serious meta-framework magic.
[Disclaimer: I'm one of the lead authors of Django.]
Generally I try to stay out of discussions of the relative merits of web frameworks. I'm too obviously biased to be objective, and there's already far too much brouhaha flying around.
However, I really feel the need to point out two things about Aaron's comments:
* First, much of what he is complaining about has since been fixed. Working well with others has been one of our top priorities in the year+ since Aaron wrote those words, and that effort has paid off big-time.
* Second, I suspect that Aaron's comments about Django were driven at the time by some personality conflicts he had with the Django community. He posted a large number of bug reports in a short period of time and reacted negatively when some of them were marked "wontfix" because we felt they didn't fit well with Django's philosophy. The tone he took through the whole interaction left me feeling very uncomfortable; I had a hard time finding the "constructive" part of his criticism.
When he then posted the article you post from above, it certainly seemed as much an attack on Django as an article about Reddit. I think Reddit's choice of web.py was the right one given their goals and constraints, but I'd encourage you to take Aaron's words about Django with a grain of salt. I suspect he's as unable to be objective about it as I am.
This language-framework-library-platform issue is one reason that in my project (http://interreality.org), our goal for the next version is to create a library that can be accessed by C, C++, as well as various other languages, through wrapper APIs and also by making it easy to generate multilple language implementations for certain kinds of things from an IDL. We'll see how this goes.
I once thought it would be easy to bring existing C++ code into the .NET runtime with a few tweaks and a recompile. Haha.
Will we someday have a magic utopia in which language and platform/framework/library are truly decoupled? Maybe something like that, I certainly hope so.
I'd encourage you to take Aaron's words about Django with a grain of salt
Thanks for clarifying, Jacob.
However, I think it's fair to say both Rails and Django require you to do some "fitting your code into [the framework's] worldview", as Aaron pointed out.
That's why it's helpful to have a few alternatives at the ready if you find a particular framework's worldview to be inconsistent with your own.
As someone who used Django quite a bit and is now moving off it using a more piecemeal fashion (CherryPy, SQLAlchemy, Preppy), I can vouch for Aaron's (and Jeff's) comments about "fitting your code into [the framework's] worldview".
I don't think the real issue is that frameworks are good or bad, they are just an attempt to streamline the common parts of building some sort of application so you focus on the interesting stuff and not the infrastructure. Both Django and Rails are good at that, but you have to "stay within the lines" to really continue getting their productivity benefits. When you need to do something outside of the sweet spot, you will often end up fighting the very framework that was once helping you.
If you only do that occasionally, then its not a big deal. But if you run into those limitations frequently, then sometimes less is more ... like web.py
The one thing that I really like about RoR is that it's a framework, where as ASP.NET is a platform. Difference being is that in a framework environment people can build plug and play components, where as on a platform, people can build components and you have to make them work for you.
For example, there are a ton of plugins available for rails that allow you to extend your models at a drop of the hat.
With RoR you start building a website... with ASP.NET you start by building a framework for your web site... I've been developing for ASP.NET since first betas, and now I'm learning RoR.
The Twitter article made me concerned in terms of performance... I knew about the issues before, but never to that degree. I've always been concerned about flaky internationalization support, but this article is really worrying. I'm thinking about going back to ASP.NET, but the though of having to do anal probes with reflector or doing SQL sit ups makes me want to cry in frustration.
I really wish there was a application framework for ASP.NET.
You should check out the Castle Project, http://castleproject.org/, specifically their MonoRail project.
From their site:
Castle is an open source project for .net that aspires to simplify the development of enterprise and web applications. Offering a set of tools (working together or independently) and integration with others open source projects, Castle helps you get more done with less code.
I've been using their ActiveRecord pattern for a few months with no complaints; their forum and community support is excellent and prompt, as well.
I guess for me it comes down to the needs of the project. Platforms and frameworks are simply different levels of abstraction. The level of control you wish to have over your code comes at the cost of increased development time to develop extra code. In my mind mode code implicitly leaves more room for error.
To me, the appeal of a framework is less about the initial speed of development and more about the abstraction it gives. This contract programming methodology decouples the server and the service, allowing me as a developer to focus my development and testing on the business logic.
I do not think it is fair to blame a framework for not supporting a feature not specified in the contract. More often, and in the case of Reddit, the project's needs change and the contract between the framework and developers is no longer appropriate. I think at this point, too many people start complaining about the framework instead of focusing on re-evaluating their needs and matching those to a more appropriate technology.
I am of the "tools in a toolbox" camp, and you could sed -e "s/framework/language/g" to the above paragraph. Jeff, I think we agree on the philosophy we are choosing different ways to describe it. Thanks for clarifying your position, though here is to hoping a third post isn't needed :-)
I really enjoyed your this post and your previous one about Ruby on Rails.
Thanks for the interesting insight, as usual.
I would take slight slight issue with Jeff's assertion that "Rails and Django require you to do some 'fitting your code into [the framework's] worldview', as Aaron pointed out." It's more that things are almost unbelievably easy to write in Rails if you do it "the Rails way", but you actually have to write a little more code to do something different (or wait for an active community to write a plugin for you).
Maybe my deeper concern with the raw performance comparison is that it minimizes the fact that some frameworks and languages can actually encourage you to write more efficient code, use better algorithms and caching approaches, etc. (although the reverse is also true) Simply because I code in c does not mean my web application will run faster.
These Twitter dudes were pretty far from determining that the time it takes to execute one line of Ruby was a bottleneck on their performance. In the the three cases in my career where I was able to get a bottleneck down to the speed of the high level language, doing calculations in signal propagation over terrain, for example, I was able to drop down into c++ to utilize some high performance libraries. Ruby makes it pretty easy to drop down into c for that purpose. Somehow I don't think that's the problem here...
Jeez, what a great blog post. After a few reddit hits, I finally just went ahead and subscribed to your site, and it's already clear that it was a good decision.
No one really has your mix of straight talk and experience and it really shows. Sorry to kiss ass (I guess I'm apologizing more to other commenters than you) but sometimes a little needs to be done, and I don't really need or want anything from you so I think it's permissible.
There's so many people out there who are sort of end-user/developers (meaning, end users of development tools) who are not genius developers or sysadmins in their own right, who just want to use these easy tools to get the job done. Having straight talking people who can help sort out the hype from the reality and help people make good business and technology choices... that just really helps the web, and especially that class of people.
Uber-developers who could write their own framework if they had to, but just consider one a bit of a timesaver don't understand developing at a more user-like level and how much one depends on people who can sort out all the nitty gritty implementation details and give a recommendation, or at least tell what's important.
The last link in your article give me a page not found error.
BTW, love the blog. Keep up the hard work.
Much better post now :)
The RoR framework is build or should we say, is popular because it provide some magic tricks for developers to get tedious tasks done fast. But it is beyond me why people can’t imagine that the magic will come with a cost.
Not sure I fully agree that ASP.Net supports both types of developers, when you stop using viewstate, the lifecycle model, etc. you cut out quite a lot of the asp.net framework. That must be the same as RoR without AJS, ActiveRecord, etc.? I dont know the RoR framework, but it should still be possible to access the httprequests withour the rails?
ASP.Net has just as well magic features that should make every Web2.0 developers dream come true. “Create a website in 30 seconds”, just use our “bla bla Control”. But the magic that made you come from A to B, does not necessarily help you get to C.
It should be pointed out that the Reddit comments predate the aptly-named "Magic Removal" revision of Django: http://code.djangoproject.com/wiki/RemovingTheMagic
You shouldn't actually read that page, except to get a sense of how much magic was removed.
It's a lot better now. In particular, Django entirely relies on the bog-standard Python module loading mechanisms now, so figuring out where you can move things is a no-brainer once you understand Python's module loading system.
Disclaimer: I know no Rails, and I'm not a web developer.
In transactional operations, it's all about the database. Rails, by my understanding, has a very object/relational-mapping view of databases, and this has the result of generating database schema that does not actually work very well. Typically this is because multiple requests are made to the database in order to follow the object hierarchy, rather than making a single request to get the one piece of data you need to fulfil this request. Similarly, if you make changes to multiple objects, multiple UPDATE commands result, whereas if you formed your schema based on what was updated together, you might manage fewer requests in total.
Your database's write speed is going to be directly governed by how fast it can write to the transaction log. SQL Server, at least, writes the logical transaction information to the disk using uncached writes, and will almost 100% be writing sequentially to the end of the log file. It then updates its internal cached copy of the data files, and only writes those out lazily. For this reason, the best thing you can do for your database server's performance is to ensure that each database has its own independent disk (or RAID 1 array) for its transaction log. Developers don't generally think in terms of the physical, but disk access time in I/Os per second is usually governed by the head seek time; keeping the transaction log disk dedicated means that the disk head is almost always in the right position to write the next log record. There's no point using any striping - for sequential accesses, that just means each disk gets used in turn, never in parallel. (RAID 5 is even worse due to needing to read the disk being written to and the parity disk for the stripe in order to compute the new parity, then write to the destination disk and parity disk, for each write that occurs - disastrous for transaction log performance).
In my work, for our own mobile-device application server, we generally don't bother producing an object model. There just isn't any point manipulating it when you're simply providing a single response for one client then getting a request for a completely different client. We've solved the object/relational mismatch by not using data objects.
i heart lisp
i heart reddit
Look at a few frameworks and pick a well supported one which :
a) Provides a simple toolkit of single-function tools you can use to Get Things Done.
b) Does more of what you want 'out of the box' than the others.
Then get about doing whatever it takes to finish the project. Scaling will push any framework. You will need to work around bottlenecks regardless of the platform.
Does this have anything to do with php?
I ussually see php used in forums in stuff. Though I never
knew 'Python' and 'Lisp' were web application languages
At the end of the day much (most?) of it is about reducing network traffic and disk activity. Twitter looks like it demands a lot of each, and therefore the architecture must be chosen and designed to support this.
As an aside, I clicked on a few links in Twitter, but had no desire to return. Am I missing something obvious? Why is this site so busy? Maybe I'm too old to appreciate it?
This is why I'm very excited about tools like JRuby. Java, for better or worse, has a munch larger "platform" than any other language in common use today. There are libraries and frameworks available to do just about anything imaginable. Integrating a "better" language with the Java platform automagically gives that language all the power of that platform.
Well written post, Jeff. To be honest, I thought the previous one was a bit trollish in light of all the hullabaloo over Twitter vs DHH vs Python vs ad nauseum.
In any case, Joel Spolsky also lies in the "it's the platform, stupid" camp. But I proposition that it's properly called the environment, not "just" the platform. The environment encompasses all that a project lives in, and what makes a developer's life heaven or hell. The IDEs, the OS, the databases, the framework(s), the online community, the forums, the documentation, the language, the sugar, etc, etc. Suffice to say, there is no perfect environment, just trade offs.
So pick your battles, write your software, and go make money!
We like money.
Having recently spec'd and designed a project to be build upon Rails I did much research into the framework. It's nice, but as I dug deeper I was realizing that there would be scalability issues as well as some not so "out of the box" functionality that would need to be implemented. I recommended ASP.NET (As for all you MS haters out there, of which I count myself one too, .NET is really nice. I just love linux and the mono-project.com guys rock) To this day they are still struggling to implement the project since the developers they decided to hire wowed them with "look how fast we can develop in ROR". Don't get me wrong though I got pretty excited about ROR and was able to slap together a nice test site, and I still like it. The point is ASP.NET made sense for what they wanted to accomplish, but they got swept up in the shininess and are now paying (a lot) for it and they're not even to the third phase of the project.
Also if you want another ASP.NET (cross platform) "framework"/CMS see mojoportal.com.
nitpick: "dynamic languages like Ruby, Lisp, and Python that will never be known for their high octane, nitro burnin' performance levels"
Some Lisps are known for extremely good performance: there's fifty years of compiler technology hidden behind all those parentheses. ;-) Other dynamic languages also sport tremendously efficient VMs -- Smalltalk's Strongtalk VM comes to mind as a great example.
I see that language performance wasn't the point of your post, but we shouldn't perpetuate the fallacy that "dynamic" must equate to "slow".
You should continue to write. IMHO you make sense sometimes, and the 'sense' is not necessarily all in the same post all the time.
The comments here are fascinating (to me). People seem to have some odd conceptions about Rails as they attempt to invent performance problems for it. "Typically this is because multiple requests are made to the database in order to follow the object hierarchy, rather than making a single request to get the one piece of data you need to fulfil this request." Rails is all about taking common problems like this and making them easy. If I want to join data from another table in a query result, aka "eager loading", it's pretty simple. You can even nest them, etc., it's a beautiful syntax (and saves tons of queries along with keystrokes).
#given a users table that "has_many" events
#join events based on a defined fk user_id in a table called events
the_user = User.find(id, :include = :events)
for event in the_user.events
puts event.name #requires no additional trip to database
I also find statements like this odd: "While in C# for example I can doing a string.split and basically create an entire array with 1 line of code. For ease and productivity you lose out somewhat on performance." Is this making the assumption that the programmer can write a more efficient split function than the authors of C#? Could be true, but in my case, it most likely is not true at all. Plus, mine would have at least one hidden bug in it... I'll use the split function, thanks.
Same guy writes- "This is where a DBA can come in really handy to help write the most optimized and sometime ugly SQL. It is quicker to write the SQL statement in the "code layer" but really it belongs in the database layer" Really? First, why can't the DBA help write the SQL in the code? Second, how are you going to call your SQL that is in the database without SQL (or SQL generation) in the code? I'm not saying you can't go the route of lots of database code and get good performance, but it's certainly not the only way.
Big fan of the blog. You do an excellent job.
A friend of mine and I were talking about your last two posts and it occurred to us that no one has really taken the time to analyze the bigger web2.0 companies with a write-up of their platforms and languages.
Maybe this is something you'd be interested in doing. I know I and probably others would be interested in reading.
Keep up the good work!
Love to have discussions in comments on someone's blog! Fun. I see your point now, in terms of database server utilization versus raw query performance. I've just had the opposite experience with stored procedures for crud operations on one huge system I worked on, but I have come to understand that I was working with a really bad DBA that didn't understand she was killing the overall application performance by writing stored procedures that returned stuff in a format that was inconvenient for the application, just so her code would run faster. Local optimization problems...but what made me wonder is that I usually ended up wrapping the stored procs in a select so that I wouldn't have to pull back 2 blobs in multiple rows on each query. Would that kill the gains you mention? Or is it things like joins that you have found to run faster in stored procs versus sql queries?
I can't comment on why specifically Twitter is having performance problems, but in general you can make this assumption: The closer the underlying code is to the machine layer, the faster it will run. There is no doubt about that. Now if the code is not optimized all bets are off. So why use languages that are not close to the machine? Supposedly they will be easier to use and implement. For example, in assembly each operation is a very small amount of work being performed so it will take a lot of code that is hard to understand (read). While in C# for example I can doing a string.split and basically create an entire array with 1 line of code. For ease and productivity you lose out somewhat on performance.
Whenever the word abstraction is used, performance will suffer. I beleive someone posted that some of the tools (Rails for instance) abstracts out the database. Now if you are writing an application with database operations and there will be a lot of users, be prepared to write some tedious data layer code. Even the small (Select * versus Select column1, column2) etc. helps out. This is where a DBA can come in really handy to help write the most optimized and sometime ugly SQL. It is quicker to write the SQL statement in the "code layer" but really it belongs in the database layer. I've seen plenty of applications that dynamically build SQL and send it in to the database and this will work for small amounts of users, but this is not the way to go for any large scale application with very specific performance needs.
There is always this discussion about how to tie the code and the data layer together and frankly it is nonsense. Let the database do its work and comunicate back the result sets to the code layer. As long of the communication layer is nice and fast lets not try and combine the two.
For large systems, stored procedures are the only way to go. Stored Procedures versus in line SQL, performance will be about the same with a small amount of users. But where it really pays off is large amounts of database operations. This is because with stored procedures, the database engine doesn't have to work as hard to deliver the results. Less IO, memory, etc. means more scalability. So instead of handle a 1000 users, now the DBMS it can handle 10000 users with same CPU and memory profile.
Every application will hit a wall, but if you can support your users with 50 servers instead of 500, then you have earned your paycheck as an developer.
To put a variation on Box's famous quote - All frameworks suck, some are useful.
Hey Matt M-
Thanks for the comments. I used the C# string.split statement to show that with frameworks such as .Net, you can do more with less code. What I was trying to convey is that you don't have to know the details or how the string is being manipulated and put into the array, you just have use the method and get back an array of string parts. If split is implemented badly, then the framework is giving your application a performance problem. I guess I gave the wrong impression that native string.split versus your own implemenation would give a performance boost. My point was that the framework will give you increased productivity but it may impact performance. You have to trust that all the implementations are good.
Second, there's nothing wrong with putting all database related activities in the the code.
However, if 1 implementation uses all stored procedures and the other does it SQL calls from the code layer, the one with stored procedures will not put as much work on the DBMS system (memory, CPU, etc) as the one with the SQL in the code layer. The system with stored procedures will scale much better than the one with ad hoc queries. On one application I did maintenance against, all SQL was written in the code and the DB Server (Mid Tier Unix Box) was working really hard to keep up. So we analyzed some of the ad hoc queries that were going in versus stored procedures and the execution time was about the same (very small differences in hundreds of a second to return results) however the memory and CPU usage that the DBMS system was putting on the Unox box was very different with the stored procedure utilizing much less resources than the ad hoc query.
That was my point, stored procedures will work the database server less.
Whether or not you choose to dynamically generate SQL is up to you, but any query, update, or delete statement can be parameterized into a stored procedure.
If one has implemented some sort of dynamic database schema, then stored procedures may not be possible. For example, say you want the users to be able to add a data field to the system automatically with actually adding the column to a table in the database, you can do that with a generic data model and query that model with ad hoc dynamic queries, but the price you pay for that flexibility is a loss in overall performance.
If the DBA isn't on the same page as the development team, then difficulties will arise. There isn't a lot you can do if the results that are being returned from the query are not what you expect. For example, if I want colums 1 and 2 from table 1 and columns 3 and 4 from table 2, then I expect the query to result those results so I can put them in a data set, date table, etc. with minimal additional coding. If I have to cycle through the result set and process each row again to get the data I am looking for, then basically your doubling the amount of processing.
Also if your stored procedure is written badly it can also performance problem. Using stored procedures in of itself isn't going to magically make the application faster.
Sometimes it all comes back to application design. If you have a query that is joining many tables in a complex query (muliple joins) it might be better to redesign or create a view that flattens out the data. Every query can be examined with tools (SQL Server - Query Analyzer, Oracle - Toad) to determine its execution plan. Execution plans show what the database will do to get the data. A good DBA should be able to analyze the execution plan and determine if the query can be optimized and rewritten to achieve better performance. On a large system, this is part of the DBA and maintenance developer ongoing maintenance.
In your siutation, it sounds like there is disconnect between the coder(s) and the DBA. My general advice is: Write the query, analyze it's execution plan with DBA and determine courses of action to make it better. In my experience this doesn't happen too often which means that maintenance on such systems becomes a headache.
It has been several years since this article was posted. Looking back on it now, one section is ironic for stackoverflow.com:
Your users don't give a damn what framework and language you're using. The only people who care about that stuff are other software developers. And God help you if your users are software developers; then you're really in trouble.