Twitter is a victim of its own success. The site has massive scaling problems, to the tune of 11,000 pageviews per second. According to this interview with a Twitter developer, a lot of the scaling problems are attributable to Twitter's choice of platform:
By various metrics Twitter is the biggest Rails site on the net right now. Running on Rails has forced us to deal with scaling issues - issues that any growing site eventually contends with - far sooner than I think we would on another framework.The common wisdom in the Rails community at this time is that scaling Rails is a matter of cost: just throw more CPUs at it. The problem is that more instances of Rails (running as part of a Mongrel cluster, in our case) means more requests to your database. At this point in time there's no facility in Rails to talk to more than one database at a time. The solutions to this are caching the hell out of everything and setting up multiple read-only slave databases, neither of which are quick fixes to implement. So it's not just cost, it's time, and time is that much more precious when people can['t] reach your site.
None of these scaling approaches are as fun and easy as developing for Rails. All the convenience methods and syntactical sugar that makes Rails such a pleasure for coders ends up being absolutely punishing, performance-wise. Once you hit a certain threshold of traffic, either you need to strip out all the costly neat stuff that Rails does for you (RJS, ActiveRecord, ActiveSupport, etc.) or move the slow parts of your application out of Rails, or both.
It's also worth mentioning that there shouldn't be doubt in anybody's mind at this point that Ruby itself is slow. It's great that people are hard at work on faster implementations of the language, but right now, it's tough. If you're looking to deploy a big web application and you're language-agnostic, realize that the same operation in Ruby will take less time in Python. All of us working on Twitter are big Ruby fans, but I think it's worth being frank that this isn't one of those relativistic language issues. Ruby is slow.
I've often said that performance doesn't always matter. But if, like Twitter, your business model is predicated on how fast your users can press the Refresh button in their browser, you could be in serious trouble if your service becomes popular.
What I find particularly amusing is the performance comparison with Python. It's hard to believe that Python is that much faster than Ruby. Python, like Ruby, is an interpreted language, and interpreted languages are so slow that if you have to ask how much performance you're giving up, you can't afford it. Consider this chart from Code Complete 2.0:
| Language | Type of Language | Execution Time Relative to C++ |
| C++ | Compiled | 1:1 |
| Visual Basic | Compiled | 1:1 |
| C# | Compiled | 1:1 |
| Java | Byte code | 1.5:1 |
| PHP | Interpreted | > 100:1 |
| Python | Interpreted | > 100:1 |
I realize that Web 2.0 is built on the back of the cheap "whatever box" server. Twitter is probably the perfect storm of refresh-heavy design coupled with exponential growth. Most websites wish they were so lucky.
To be fair, it sounds like most of Twitter's problems are database problems, so maybe it doesn't matter what language they use. But it does make you wonder: what's more important-- the service, or the platform you deliver that service on?
In the case where the latter is jeopardizing the former, I think it's pretty clear where your allegiances should lie. Your users don't care how cool the Rails platform is-- but they sure do care about consistent availability of your service.
Update: This entry isn't as clear as it could be. See my followup to this post for a better explanation of my position.
Maybe Twitter is already taking this advice to heart:
http://twitter.com/blog/2007/03/now-hiring-senior-engineer.html
Jeff Atwood on April 13, 2007 7:30 PMC# is byte code, just like Java is. And its not as fast as C++, either. Like Java, its very close, but its not as fast overall as C++.
jeremiah johnson on April 13, 2007 7:50 PMI'm betting people on the Java side of the Java vs Ruby debate will jump all over this and a name-calling-palooza will erupt over at places like The Server Side.
It'll be interesting to follow this. I love the concepts and ease of programming that Ruby and Rails give you, but when you look at the total cost of ownership over the life of an application, programmer time isn't always the most expensive part of that. Ask somebody trying to run an enterprise application on .NET right about now about how much downtime they have to schedule in to deal with the endless security patching. That adds up no matter how cool Microsoft makes the IDE.
Same deal here, essentially. My sys/admins and upper management types at the Fortune 10 I work for are scared to pieces of change. They think of their JVM as reliable, tested over a number of years, and scalable. No way Ruby gets past an architecture review in its current form. I'm really hoping that either Grails or JRuby gets some more traction so we can have our JVM cake and eat our programmer productivity tools too.
jeremiah,
You notice that assembly is not on the list, right?
The reason is that C++ compilers are better than humans in understanding the CPU.
In the same manner, the JIT is good enough to produce code that competes head to head with C++ code.
Python is interpreted, but it's faster than Ruby because it's compiled to bytecode first. Ruby doesn't yet have a bytecode compiler, but it's coming - I think within a year.
trousercuit on April 13, 2007 8:43 PMThis article isn't up to your usual standards of care on the details. In fact it's really quite horrendous. I respect Code Complete as much as the next person, but I shudder to think of the uninformed person learning from this or the informed person quoting this disingenuously.
First I must say, I lately mostly code in Java, spend a lot of time in Perl, and 2 years ago rejected Ruby for a performance intensive project due to its failure at the metrics we wanted to achieve. I try my hardest not to be biased, but you know how that goes. But hey, I try.
1. The obvious: "interpreted" code typically goes through multiple compilation processes into the psuedo-vm that will eventually run the psuedo-byte code. It isn't all that freaky different from C# and Java presuming that it's a well honed interpreter.
2. Also obvious: Java and C# only perform as fast as C++ in certain metrics. Any metric that broadly states that they are 1:1 is clearly (a) flawed, (b) biased, or (c) misquoted. Interestingly, they can both beat even C in some tests due to the very simple fact that using a very well written library which is commonly available in those languages, as opposed to rolling you own, is a great way to optimize '-). Surprisingly enough there are some tests where Perl will come close.
And that's the real point here. And the point that this article misses. I know it's annoying to keep stating the obvious, but performance test are about the most flawed and difficult thing that can be done with languages, and it's just our lot in life to keep admitting that fact every time we make them or quote them.
3. Everyone who isn't a fanboy knows that the Ruby interpreter doesn't win prizes for speed. The bottom line is that generally, yes, Python is significantly faster. It's not a big deal. I dislike Python syntax. I rarely code in Python. I am annoyed by Ubuntu, which I otherwise love, due to its Python bent. I like Ruby syntax and ideas a whole lot more. But objectively, I know that Python is faster, because of direct experience with (a) scripts I've made, (b) apps I've run, and (c) others' trusted statements.
I admit that I personally don't particularly like the Rails tidal wave. But it's mostly not for language reasons and more the effect of the way too many emotionally charged decisions being made due to popularity and initial coding ease as opposed to fully objective analysis for each target project. But you can't argue with success even if it is Flickr dumbing down its features supposedly because of performance (a lot of that's on PHP), or Twitter's performance sucking as they realize that script-generated database code isn't cutting it for them.
But hey, this isn't new. MS came out with the VB years ago with its own comparable set of trade-offs (donning flame-retardant suit), and I'm sure none of us enjoy the remembrance of all those horrid yet supposedly useful little shareware apps that made you download the VB .dlls. (That's the real reason people originally complained so much about the .NET libs, those repressed memories of VB DLLs.)
And the objective conclusion is, they have been working on Ruby speed for years, and it still aint there yet. One day, I'm sure (not being sarcastic here).
In any case, I love your blog, but yeah, c# a 1:1 with c++? heh.
Marcus on April 13, 2007 8:43 PMI don't think this is a "Rails problem" I'd like to have seen a ASP.NET VB.NET app running Twitter. I bet you'd have similar (if not the same problems) that they are having. The language speed comparison you've shown has minimal impact. So I don't know what the point of having there is. You've forgotten to talk about the web server, the database server, the hardware and network setups. That's where obviously the bottlenecks lay. But to make it a language dispute, doesn't make sense. So what if PHP is 100 times slower than others? It happily is used by very many web sites that handle very large traffic and loads.
Diego on April 13, 2007 9:38 PM> What I find particularly amusing is the performance comparison with Python. It's hard to believe that Python is that much faster than Ruby. Python, like Ruby, is an interpreted language, and interpreted languages are so slow that if you have to ask how much performance you're giving up, you can't afford it.
Actually, Python is byte-compiled, just like Java or C#. The reason it's slower than C#/Java is because it's dynamically typed. Dynamically typed languages will always be slower than statically typed.
Python has about the same performance advantage over Ruby as C++ does over C#/Java - 5x.
Python vs. Ruby:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=python&lang2=ruby
C++ vs. C#:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=gpp&lang2=csharp
"C# is byte code, just like Java is. And its not as fast as C++, either. Like Java, its very close, but its not as fast overall as C++."
C# is actually FASTER that C++ in many benchmarks I've run, and if it is slower, it's only by very small ammounts. The explaination is that C# is bytecode on the disk, but it IS compiled to machine code on-the-fly, and the compilation only happens once (unless there is a change, obviously). With this on-the-fly compilation, the code is actually optimized for the SPECIFIC processor that the system is running, as opposed to the C++ method of general 386/Pentium optimization, as is the most common.
Allied on April 13, 2007 10:10 PMAlex expands on his comments in a discussion on DHH's blog here: http://www.loudthinking.com/arc/000608.html.
@diego - You know that MySpace runs on ASP.NET, right? They're probably around 2 billion page hits a day by now.
As others have pointed out, though, this has got to be all about the database. This is a simple app with very little processing. The majority of the load is in the database. Language speed probably has little to do with it.
Jon Galloway on April 13, 2007 11:51 PMUgh, the clueless language fanboys are at it again. That's what you get for posting performance metrics. :(
Jeremiah is of course correct, C# is JIT-compiled byte-code just like Java, so it's close to C++ but not quite as fast. If someone claims to have C++ code that's actually slower than the equivalent C# code then that someone doesn't know how to write and/or compile decent C++ code, end of story.
Also, standard Python is NOT compiled to machine code. The textual code is compiled to byte code, but that byte code is then interpreted rather than JIT-compiled. There are tools like Psyco that add true compilation but Psyco is still slower by at least an order of magnitude than Java or C#. Don't know if someone has come up with a better independent Python compiler yet. IronPython for .NET is said to be pretty fast, though.
Chris Nahr on April 14, 2007 12:00 AMCould you please cite a source for those performance numbers?
Jason Creighton on April 14, 2007 12:36 AMOn c++ vs c# - http://blogs.msdn.com/ricom/archive/2005/05/10/performance-quiz-6-chinese-english-dictionary-reader.aspx
chrisb on April 14, 2007 12:54 AMFor those arguing about C# vs C++, again, would you PLEASE stop pretending it's so clear-cut. There's no "end of story".
See http://www.codinghorror.com/blog/archives/000299.html for some REAL discussion.
Random Reader on April 14, 2007 1:04 AMI find the interesting bottleneck the inability to "talk to more than one database at a time". You can optimise the hell out of your code, reverse some of your ruby wizardry and clean code for performance optimised magic, but you will still keep falling at this hurdle.
The "Random Reader" beat me to the link. Noone pretends that that example covers all cases, but it shows certainly one case where C# outperforms C++ without a lot of hard work that few are willing to make.
[ICR] on April 14, 2007 1:24 AM"@diego - You know that MySpace runs on ASP.NET, right? They're probably around 2 billion page hits a day by now."
@jon, it wasn't my intention to suggest that ASP.NET couldn't handle large loads. What I meant was that it's about other factors, like database, web server, server farm setups etc. So whether you have your app in Ruby or ASP.NET then you don't have an easy ride with either when it comes to handling large loads. It's something that takes work and has to be planned. I wasn't knocking ASP.NET :) That also goes in the other suggestion when people outright suggest that Rails can't handle large amounts of traffic.
Diego on April 14, 2007 1:25 AMHow hard would the specific features he mentions be to port to another platform? I've had performance issues with Python in the past, but one nice thing about Python is that it can interface with compiled code fairly painlessly.
Amidst this "my language is faster" flame, it'd be good to remember that performance issues are best left until after the first iteration of development. Once you've gotten that far then it usually becomes pretty clear where the performance bottlenecks are, that old adage "___ % of your execution time is in ___ % of your code", the 80/20 rule, blah blah blah. Who's to say whether Twitter would have ever materialized had they started out with a platform as cumbersome to develop as C++.
Also, there's no way I believe that Java is only 50% slower than C++. That's gotta be a typo.
Dan on April 14, 2007 1:34 AMthis is why i don't believe in 'one does everything' monster framework. Too much abstraction is bad programming in the end because now you have a slow pig website. there has to be a somewhat personal, low-level approach to certain tasks. and alot of them suck to do. but that's the nature of this work. suck it up and code. -1 for RoR.
max on April 14, 2007 2:43 AMAs somebody who began life on 8 Bit CPU . . .
ITS THE ALGORITHM STUPID !
This can be the hardest thing to admit, that the fundamental design is flawed . . .
> Also, there's no way I believe that Java is only 50% slower than C++. That's gotta be a typo.
Java is fast, Swing is slooooooooooooooooooooow...
niklas on April 14, 2007 4:33 AM"this is why i don't believe in 'one does everything' monster framework. "
You don't believe in frameworks because in the case of one where the site became monstrously popular it started to stress the framework? Unfortunately for Twitter, their design is all about constantly updated data and a zillion writes with far fewer reads (I imagine), so you can't save yourself with caching like usual. A little short-sighted about not being able to talk to multiple databases, but it <a href="http://tomayko.com/weblog/2007/04/13/rails-multiple-connections">doesn't seem to be accurate anyway.</a>
Twitter's been a hit. All this means is they may have to spend some time rolling changes into Rails or re-imagining their data layer(s). Not the kind of thing you want to do in version 0.2, but there are worse problems to have.
Tom Clancy on April 14, 2007 5:16 AMYou ate my link, comment box: http://tomayko.com/weblog/2007/04/13/rails-multiple-connections
Tom Clancy on April 14, 2007 5:17 AMHere’s someone who *has* scaled Rails, enough to satisfy clients that collectively “process something like 60% of the US population's health claims every year.”
http://tomayko.com/weblog/2007/04/13/rails-multiple-connections
Ryan writes:
> When I consider what contributed to the unraveling of J2EE,
> one thing that stands out is that it tried to do too much.
> The promise was that of infinite scalability based on tooling,
> which assumes that designing scalable systems is a general case problem.
> I now firmly believe that this is flawed reasoning.
And the money quote:
> Frameworks don't solve scalability problems, design solves scalability problems.
Aristotle Pagaltzis on April 14, 2007 6:37 AMI think a number of commenters here have made the right points:
1) The speed of Twitter is entirely about the database. I'm a little surprised the developers have even bothered to talk about Ruby's performance.
2) The choice of underlying technology can't be made entirely on considerations of what would be best for when you're as popular as Myspace. You also have to consider the effort to get the service off the ground. Somewhere out there could be a dedicated team building a Twitter-clone on an enterprise-class foundation, but we haven't heard of it, because they are still working toward 1.0. Start-ups are all about making intelligent compromises. Twitter has hit it big: hard to argue that they made a wrong decision someplace..
Ned Batchelder on April 14, 2007 6:38 AMPython's byte-compiled, Ruby is actually interpreted. In any case, the slow bit here seems to be the database, so blaming Ruby is a little odd (though blaming Rails may make sense).
I'd be a little dubious about that chart of programming language speeds given; saying that C# is compiled (but to bytecode) while Java is 'byte code' (thus also compiled) seems odd, and Python is sort-of compiled too.
Clearly, it should be written in Common Lisp. :)
Robert Synnott on April 14, 2007 7:10 AMIn my experience, the speed of Python depends a lot on how much time you are spending in C implementations of core functionality and how much in the interpreter. It's a little frustrating sometimes - at best it ties patches of C together with a thin layer of dynamicness and approximates other VM'd languages in speed, at worst you're trying to do number crunching in it which can end up crazy amounts slower.
I imagine the same applies to Ruby.
This post is below your usual standards Jeff. It's been quite clearly stated that the issues the Twitter folks are facing are database related. I'm also stunned that anyone would see a data driven service that has to handle 11,000 requests a second and think that the bottleneck will be in the computation instead of I/O.
Dare Obasanjo on April 14, 2007 7:45 AMSide note: I have zero experience with rails and little with Python. I am a DBA so here is what I am seeing.
Without the mention of which database and the hardware that it is on, one can only speculate on what the biggest hurdle is. Databases handle inserts very well, about as fast as the underlying disks. This might be an issue with :
1) Doing updates/deletes along with inserts. Updates and deletes incur 2 or 3 times as much disk I/O. This would be a design flaw, eithe in their app or in rails (again, no expert here - just speculating).
2) Their hardware setup is incorrect for their size/# of hits. Solid state disks are VERY expensive, so lets assume that they do not use them. The next best set-up would be RAID 0+1 on many small drives. This spreads the disk I/O across many drives. This is also expensive because the drives require more disk controllers, more space in the rack, more drives will fail - so more hands-on replacement of said drives, etc... RAID 0+1 on large drives. This is probably what they are using. It provides decent I/O and cost ratio. This (or just RAID 1) is used by most business that had a dedicated admins (datawarehouses aside). Another setup could be RAID 5. This is not ideal for databases, expecially if they have a lot of writes (so data warehouses typically use RAID 5 for better read performance).
3) Too many indexes/triggers on the tables gettig written to. Each index at leasts doubles the disk I/O.
These were just a few of what could be amiss. Of course, 11,000 requests a second is quite a bit. This could very well be the database software itself not being able to handle that load.
Glenn on April 14, 2007 8:32 AMI'm afraid I have to agree with Marcus and Dare, Jeff: this was not your best work. And I say that as someone who has a Coding Horror sticker on his Windows laptop.
Andrew Shebanow on April 14, 2007 9:16 AMThe database 'problems' might be because of RoR
If you never write the SQL how do you know it's optimal? Over time fragmentation of the table/indexes occurs (this is especially true when you update a lot and you get page splits). If you had stored procs it is so much easier to tune the DB. Even if the DB is the bottleneck this can still be because of the SQL generated by RoR. Are the indexes optimal for example? Too many and inserts/updates suffer. too little and retrieval suffers
Denis
Denis The SQL Menace on April 14, 2007 9:18 AMI'm surprised at the number of people saying Twitter is only facing database problems. The quoted article was pretty clear about the problems with RoR:
"The problem is that more instances of Rails (running as part of a Mongrel cluster, in our case) means more requests to your database. At this point in time there’s no facility in Rails to talk to more than one database at a time."
It's Rails itself that doesn't scale, not the database. A lot of other web platforms have features to deal with this sort of mess.
Eam on April 14, 2007 9:19 AM"Ask somebody trying to run an enterprise application on .NET right about now about how much downtime they have to schedule in to deal with the endless security patching."
WTF? Did you pull this one out of your ass? Can you give me a example of this "endless security patching" or did you make that up?
Jack on April 14, 2007 9:37 AM>>"Ask somebody trying to run an enterprise application on .NET right about now about how much downtime they have to schedule in to deal with the endless security patching."
You mean MySpace? Which ran on ColdFusion and couldn't handle the load
Denis The SQL Menace on April 14, 2007 10:07 AMOK, I'm sorry, but if any of my sites hit 40 millions requests per hour I'm certain I'm going to have performance problems. This is almost a non-issue. We have had similar problems on a much smaller site using hibernate. In addition on a miniscule site with 20 users we have monstrous performance problems using EJBs.
The common problem in all of these was that we where trying to do more than the database was capable of handling. True a properly tuned database might help with our problems, but, most often we didn't tune anything until we had problems. In fact, we did that by design, why tune it before you need to?
In all cases there where fixes available.
mike on April 14, 2007 11:41 AM>>In fact, we did that by design, why tune it before you need to?
I don't agree with you.
Because it will be too late then, you start by writing proper code to begin with
If you write proper database code and have a proper data model then in all likelyhood you won't have to tune that much
However if you write
SELECT * instead of listing the columns when you don't need more than 2 column your IO will be greater
Or if you have a WHERE clause like this: WHERE LEFT(Col1,3) = 'ABC' instead of
WHERE col1 LIKE 'ABC%' you will get a fat-ass table/index scan instead of an index seek
This is not optimizing this is knowing how to write SQL
In theory you should not have to change your SQL that much if you have perormance problem. You couldhave federated database servers, partitioned tables, log, data, nonclustered indexes and data on separate drives. Reindexing/defragmenting indexes etc etc etc
Denis The SQL Menace on April 14, 2007 11:54 AMThis is DHH's "opinionated software" home to roost. Many have made the point on the RoR blog. The issue is that DHH, and RoR fanboy acolytes, insist that all logic goes in the code; the database is just dumb files (think: here's the 60's again), fronted by a SQL parser (and that's *all* base MySql is). Some on RoR have tried to get the thinking going the other way, and Migrations might be a step that way.
I first this from Artima,and the real article says Twitter uses Postgres, which while nice, doesn't do the necessary structural things you get with DB2, Oracle, and SQLServer. They're not free, but sometimes you get what you pay for.
Nothing scales better than an industrial strenght RDBMS, *if* you know how to use one. Remember: their developers have been doing parallel processing programming for decades. Like it or not, they've learned more about how to do that than you have. Take advantage of it: put the data and smarts in the database, and use the application for the pixel dust. Then you can scale to the moon, Alice.
buggyfunbunny on April 14, 2007 12:15 PMInteresting debate but a little flawed on some aspects.
Firstly, this is about web applications which Twitter is (if its not, please explain why not!).
Comparing C++ into the equation is a little short sighted and those people who are defending C++, you really have no argument here as when was the last time someone created an entire web application in C++?
If they did then god help them as its not an area that C++ is designed for. C++ in certain environments, such as technical or scientific areas is the king and there is no doubt about it but it comes down to business requirements.
End of the day as a business you will want to bring an application to market as quickly as possible, hence the need for middle to high level languages, which is where PHP, C#, Java, Ruby and to some extents the more esotoric brands like perl and python fall into. The argument about performance is a bit too vague as again, you have so many factors to consider e.g. is the language in question being tested against the development of an Web Application or a Desktop/Embedded application?
Raw compiling charts don't mean anything except to the neurotically charged induviduals who care only about these facts.
What has been highlighted is the state of play of the Web Application domian. There are still many areas where Web Application design and development is limited by the technology available however this also offer a unique challenge where the designers have to architect a solution based on the limited resources (including scalability). Developing say a desktop application is not a big deal anymore, end of the day if its far quicker and easier to develop a C#/.NET based application than a C++ equivelant then I am afraid I would hire a C# developer - the average computer is incredibly powerful and as such most coders don't have to worry about things like performance as they know the system will offer enough bang to cover the limitations.
What is important is how web development proceeds in the future. Its obvious RoR has limitations that Twitter has encountered and it would be good to see the dev's taking this oppertunity to expand out of the box and develop the design further to handle other database systems and improving the language overall.
But as someone mentioned, most preformance issues in a web environment are to do with networking (internet speeds, hardware), server platforms and the client system that is being served.
We are essentially running systems that are incredibly powerful down a network that is barely capable of handling the demand, until this area improves we will always have issues that are outside of the control of the coding language of choice.
finalzero on April 14, 2007 2:17 PMThe only thing that I can think of for why Code Complete says that Java is Byte Code where C# is compiled might have something to do with the fact that Java was originally compiled to byte code and then that byte code was interpreted during execution. Things have changed and that byte code is now JIT compiled just like C#'s MSIL is JIT compiled at program startup.
Steve on April 14, 2007 2:27 PMWhile I'm not super familiar with the specifics behind twitter, I did write most of the code behind another very large Rails site (www.penny-arcade.com) and I am fairly familiar with the performance of the Rails stack. I realize that we have vastly different data access patterns, but in my experience with Ruby scaling issues can be broken down and identified fairly quickly by simply benchmarking different parts of the application (usually through unit tests).
It sounds like their bottleneck is at the DB level. While it isn't a magic bullet, I failed to see the word memcache mentioned anywhere in the articles I've seen on twitter. Looking at their access patterns memcache seems like it would make a lot of sense, I'd venture to guess > 80% of their traffic hits only the most recent data (5%-10% of their data). In my experience memcached can really help improve scaling of a site that is DB bound. It probably isn't feasible to implement page caching, or even action caching for most of their pages, but I think using a memcached cluster for fragment caching will save them a lot of db ops.
Twitter is not the first social networking site to run into this problem. Facebook runs PHP (comparable to Ruby) and they switched long ago to using memcached to keep their hot data ready to serve to the client. Livejournal, Slashdot, WikiPedia, and even SourceForge use memcached to prevent their database servers from getting overwhelmed. I would have preferred to see a post about the importance of caching in web applications, rather than restating that compiled languages will out perform scripting languages.
Erik Karulf on April 14, 2007 3:18 PMSo performance doesn't matter, except for when it does. That sounds about right. The good news about this is that there are lots of ways to optimize Ruby (and Rails). I've seen scripts where you can embed C code into your Ruby code and a parser will compile the C code and create a loadable module and modify the Ruby code to call that C function. Optimizations like this though are by nature quite esoteric.
The real question to ask is was the work to develop the site in RoR plus the time to optimize RoR for the massive increase in traffic greater than the work would have been to develop the site in C++ (or similar) plus the time to optimize the C++ site for the massive increase in traffic.
Also, I'm suprised they can't put the database on a cluster where incoming database connections get routed to one of a cluster of database machines. There may be some obscure technical reason why that is difficult, or maybe that is the route they are planning on taking.
Whoops, my grammar on that middle part was horrendous. What I meant to say was:
Is work_to_develop(ROR site) + work_to_optimize(ROR site) > work_to_develop(C++ site) + work_to_optimize(C++ site)?
I think we can safely say that work to develop is less for RoR, but is the work to optimize that much more for RoR? Also by "C++", I mean "some fast framework, perhaps written in C++", not absolutely C++ and not from scratch. All this is complicated stuff. YMMV.
Brendan Dowling on April 14, 2007 3:57 PM>>> "Comparing C++ into the equation is a little short sighted and those people who are defending C++, you really have no argument here as when was the last time someone created an entire web application in C++?"
Well, for example, I have. Web check-in for a major airline. And it's one serious butt-kicker in performance compared to the alternatives. The company I work for does C#/ASP.NET (and a couple of JSPs) as well, and we get decent performance out of them, but they're much tougher to maintain than the C++...Believe It, Or Not!
Brook Monroe on April 14, 2007 6:10 PMMySpace runs on BlueDragon (a ColdFusion clone), not ASP.NET...
http://www.myspace.com/codinghorror.cfm
Simon Wright on April 15, 2007 12:47 AMSimon,
This link sure looks like ASP.NET, it is even an aspx extension
http://browseusers.myspace.com/browse/browse.aspx?&MyToken=88edcb37-903d-4f00-b741-acf2f58d550c
Yes parts of it run BlueDragon but that is still running on .NET, it is just CFML compiled to .NET
Denis The SQL Menace on April 15, 2007 3:42 AMRe: MySpace, well I think Scott Guthrie knows best:
Handling 1.5 Billion Page Views Per Day Using ASP.NET 2.0
<a href="http://weblogs.asp.net/scottgu/archive/2006/03/25/441074.aspx">http://weblogs.asp.net/scottgu/archive/2006/03/25/441074.aspx</a>
Andrew on April 15, 2007 6:14 AMSee Erik's post earlier in this thread - memcache (or something similar) is 90% of the answer and should be priority 1.
Optimizing queries and db schema should be priority 2.
Priority 3 - look into mcluster or another clustered solution if you have lots of concurrent writes and the 2 steps above aren't quite getting you where you need to be. (down side === $$$)
Profiling and optimizing code execution is always good, but won't help much with the db access problems. The execution speed of the run time or interpreter is usually not gonna be a problem - esp when throwing more front ends at it will take care of that issue.
OldWebDevGuy on April 15, 2007 10:08 AMI think your overall point: What's more important the service or the implementation? May have gotten lost when you gave so much space to a side note on language performance.
If RoR allowed them to sieze the business opportunity, it's hard to argue with it as an initial choice. The question is whether Twitter will become a Friendster or MySpace. Neither started with a scalable architecture, but one of them found a path to it.
I'd say the issue with Ruby is less about language speed, and more about the architectural choices it influenced which now must be revisted to handle scale.
Steve Steiner on April 15, 2007 11:03 AMHow can people opine about performance without load testing and isolating the bottleneck first?
Why is CraigsList (PHP and mySQL I think) pretty dang fast?
Many developers have little know-how when it comes to big database design. Are hardware load balancers being used? Why focus on the development language?
Once wrote a VB5 front end to a DB2 app -- every request took 1 second -- small, medium, or large. It was the middle-ware. Not the P-code VB.
In very specific instances, having a garbage collector can be faster than C++.
For example, the fact that you don't have to delete little itty bits individually.
But usually straight C++ is much faster.
GregMagarshak on April 15, 2007 11:57 AMHi, I have a question about the chart from Code Complete 2.0. What is the Python version considered? It's a very important factor since Python's average speed dramatically improved with latest releases:
Python 2.3 (2003): 30% faster than 2.2
Python 2.4 (2004): 5% faster
Python 2.5 (2006): 10% faster
Source: Alex Martelli http://www.aleax.it/Python/py25.pdf
Moreover Python 2.6 and Python 3000 are in the making...
Alfred on April 15, 2007 2:10 PM> I think your overall point: What's more important the service or the implementation? May have gotten lost when you gave so much space to a side note on language performance.
I think you're right. :)
I am as mystified as anyone else why Alex P thinks the language matters. It just seems irrelevant, not just because the bottleneck is probably elsewhere, but because we're comparing two interpreted / dynamic languages anyway, which aren't known for their speed.
This comparison only makes sense if you think of language as part of the *platform*.
For example, consider why the Reddit folks switched from Lisp to Python. Not because the *language* was better (Paul Graham would probably have a coronary, and he funded Reddit), but because the *platform* around the language was better.
http://blog.reddit.com/2005/12/on-lisp.html
http://www.aaronsw.com/weblog/rewritingreddit
Similarly, the Rails platform seems to make assumptions about the way the database works that can make it hard for the Twitter service to scale.
Jeff Atwood on April 16, 2007 9:35 AMIt pains me to say this especially since Twitter has been so successful. The twitter case seems to be one of choosing the wrong technology for the job. Sure initial development time may have been shorter by using RoR but now (hindsight of course:) it seems that choosing another framework may have been better in the end.
Bill on April 16, 2007 10:11 AM"Similarly, the Rails platform seems to make assumptions about the way the database works that can make it hard for the Twitter service to scale."
Hmm, the default configuration assumes you have one database connection per model object. That connection could be to a load balanced cluster or to sqlite. I don't think the assumptions made by ActiveRecord are that different than those made with any other Object Relational Mapping engine.
I have to agree with those asking about the lack of data caching via memcached or MySQL clustering/load balancing, it seems like a serious oversight. Even more amazing to me is that, reading up on this, they appear to be using a hosting provider versus running their own systems.
Probably the most amazing things about Rails is that allows developers who don't appear to know much about scaling to write apps that scale to 10000 tps.
For the guy who said that more stuff needs to be done in the database, you are just wrong. Look up how eBay has scaled J2EE (throw away everything except servlets and jdbc connection pooling, do all sorting, fk constraints, etc. outside of the partioned databases) for proof of that. In any case, this about transaction volume, not complex queries.
I think when you are talking about huge transaction loads, most frameworks have to adapt in some way. It looks like a guy has written a plugin to support the approach Twitter wanted in 75 lines of Ruby, to be followed up with two more plugins that let you add read only slave DBs by adding two lines to your database config file. In other words, the moaning by one developer has just resulted in ActiveRecord getting a capability very few existing ORM frameworks have. I've got to try that (moaning) next time I have a hard scaling problem at work. Easier than solving the problem myself! ;-)
Blah, blah, blah.
Jesus f'ing Christ. We're talking like we're experts when everybody seems to have missed the real problem, including the dorks at Twitter.
Here's the damned problem:
The damned site is synchronous. That's the problem. Threads are waiting for the database, and those threads are tied up until the database can respond back.
I have no idea if Ruby has an asynchronous programming model. If it does, then it's Twitter's own fault for totally missing the boat. If it's not in Ruby, then Ruby sucks. Either way, somebody blew it.
foobar on April 16, 2007 11:05 AMThings like this make me sometimes wonder if a relational database is best for data storage. Are they really using all the features of a relational database or is it 99% "SELECT foo FROM bar WHERE userid = 12345"? If so then there must be a more efficient way for web servers in a server farm can share that data such that lookups are fast and local.
Reed on April 16, 2007 12:17 PMThere has to be a drawback of interpreted languages. I really wonder if the designers of twitter would have used another platform if they knew that the service would get that success. Did they just use Ruby because it was the quickest way to get that thing up and runnin'?
tOMPSON on April 17, 2007 1:11 PMHmm. 11k page views / sec = 28,512,000,000 views / mo. That's 28.5 billion. I call bullsh*t.
Courtney Payne on April 19, 2007 9:30 AM"Hmm. 11k page views / sec = 28,512,000,000 views / mo. That's 28.5 billion. I call bullsh*t."
11k/sec meant the peak(s), not an average.
PekkaR on April 19, 2007 7:16 PMjava is faster provided its JIT compiled(Ie the code is optimized everytime you compile it), Infact it can be faster than the c++! Yes, I can see people eye brow's rising? But its a fact..
Ranganath.S on April 20, 2007 4:04 AMAnyone tried Monorail?
It's a rails framewwork for .Net that we have had great luck with.
It includes an ActiveRecord implementation and a nice IoC container.
Agile Jedi on April 20, 2007 8:04 AMThank you:
> The damned site is synchronous. That's the problem. Threads are waiting for the database,
> and those threads are tied up until the database can respond back.
>
> I have no idea if Ruby has an asynchronous programming model. If it does, then
> it's Twitter's own fault for totally missing the boat. If it's not in Ruby, then Ruby
> sucks. Either way, somebody blew it.
I just came from a startup attempting to use rails for high amounts of traffic (with apache reverse proxy and mongrel). Ruby has primitive threading at best and RoR is NOT threadsafe. Rails has a big lock around all the db access code and no connection pooling. The 'default' xml parser for ruby has been described as a 'new kind of slow'. Dunno.. maybe they've fixed it by now.
Compared to Tomcat, jdbc pools and plain old jsp RoR is pretty bad from a performance perspective. It makes up for it with a lot of pretty nifty cache behavior given the right type of workload (low complexity db queries). But it's nothing that a decent java framework doesn't give you... Nothing similar to webflow, spring-mvc or the like... erb (embedded ruby) is better than jsp... but that's not saying much is it?
RoR IS faster to throw together a two tier web app with. But the java based frameworks have a much better scalability story. And maintainability IMHO.
I think we're going to be seeing a lot of startups hit the RoR scalability issues and attempt to work around it with db tuning, schema tuning, and various hacks to RoR... All until it becomes essentially just as complex as as java based framework.
After all... the core of the scaling issue is language independent...
> There has to be a drawback of interpreted languages. I really wonder if the designers
> of twitter would have used another platform if they knew that the service would get
> that success. Did they just use Ruby because it was the quickest way to get that
> thing up and runnin'?
Interpreted v. compiled has nothing to do with it (mostly). It's about architecture. RoR is currently not scalable for general purpose use. For particular uses it's just fine... even really good for low complexity - low frequency db access (Basecamp and the like). Complex/heavy queries and/or large update patterns will kill your RoR based system.
Of course this can all be fixed... java used to have the same problems. Only took a dedicated multi-billion dollar company 5 years to fix it.
I think developers have a tendency to run toward whatever the shiny silver bullet of the week shows up. RoR has the hype meter at full stop right now and people never learn.
Had they built twitter around established frameworks they'd have been much better off right now... but they wouldn't have had as much 'fun'... or have been as 'cool'.
dave on April 23, 2007 1:25 PMRuby on Rails scalability is indeed a hot topic. First off and being utterly blunt, Alex is talking out of his ass, he clearly has no experience with building scalable web sites.
The way I see it, scalability is really more a platform and architecture challenge than a language specific problem. MySpace was originally written in ColdFusion, then upgraded to ASP, then .NET.
Check out this article on history of how MySpace scaled http://www.baselinemag.com/article2/0,1397,2082921,00.asp
The focus is to identify the bottlenecks (not just blinding blaming a whole language) and then adapting the architecture to deal with it. Developers and management alike tend to think that there is only 1 right architecture to scale, that's so not true, every high volume website has their own unique set of challenges.
Here's another article from someone who seems to have a better understanding of scaling RoR
http://poocs.net/2006/3/13/the-adventures-of-scaling-stage-1
By the way, there is no business using Mongrel in a site with massive traffic, go to lighttpd or Apache with fastcgi, etc.
My account was suddenly canceled. I went through the procedure to restore it and it is still canceled in spite of the promise to restore it in a few moments. The 2 desperate messages to Twitter have not been answered.
nahumg on January 5, 2008 12:33 PMNice post! Optimization does count!
Speed on March 1, 2008 6:02 AMJeff, I was about to email you this link, but figured it would be better to "make it public":
Twitter's Growing Pains
http://www.technologyreview.com/Infotech/21103/?a=f
Not sure if there's any new info here, but it was posted today.
Kyle
Kyle Estes on July 21, 2008 6:47 AMThis could be a solution for them, http://www.espace.com.eg/neverblock
This library would solve many of the concurrency issues held by Ruby. Regretfully the implementation done till now is for Ruby 1.9 as it builds over the fibers implementations in 1.9
| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |