I have decidedly mixed feelings about the book Beautiful Code, but one of the better chapters is Tim Bray's "Finding Things". In it, he outlines the creation of a small Ruby program:
counts = {}
counts.default = 0
ARGF.each_line do |line|
if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
counts[$1] += 1
end
end
keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a] }
keys_by_count[0 .. 9].each do |key|
puts "#{counts[key]}: #{key}"
end
Tim calls Ruby "the most readable of languages"; I think that's a bit of a stretch, but I'm probably the wrong person to ask, because I've learned to distrust beauty:
It seems that infatuation with a design inevitably leads to heartbreak, as overlooked ugly realities intrude. Love is blind, but computers aren't. A long term relationship -- maintaining a system for years -- teaches one to appreciate more domestic virtues, such as straightforwardness and conventionality. Beauty is an idealistic fantasy: what really matters is the quality of the never ending conversation between programmer and code, as each learns from and adapts to the other. Beauty is not a sufficient basis for a happy marriage.
But I digress. Even if you have no idea what Ruby is, this simple little program isn't too difficult to decipher. It helps if you know that Tim Bray's blog URLs look like this:
http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder
This is a program to count the most common HTTP GET URL entries in a webserver log file. We loop through the entire log file, building up a key-value pair of these URLs, where the key is the unique part of the URL, and the value is the number of times that URL was retrieved.
Maybe it's just the Windows developer in me, but one might wonder why you'd bother writing this code at all in the face of umpteen zillion free and commerical web logfile statistics software packages. After all, the best code is no code at all.
Well, perhaps there is a reason for this code to exist after all. Tim eventually turned this snippet of code into a benchmarking exercise -- The Wide Finder Project.
It's a classic example of the culture, born in Awk, perfected in Perl, of getting useful work done by combining regular expressions and hash tables. I want to figure out how to write an equivalent program that runs fast on modern CPUs with low clock rates but many cores; this is the Wide Finder project.
A noble experiment, indeed. The benchmarks were performed on the following hardware:
The input data is as follows:
The results are sort of.. well, all over the map. I'll summarize with the worst and best scores for each language:
| Slowest | Fastest | |
| Perl | 44.29 | 1.51 |
| Erlang | 37.58 | 3.54 |
| Python | 41.04 | 4.38 |
| OCaml | 49.69 | 14.64 |
| Ruby | 1:43.71 | 50.16 |
I'm simplifying quite a bit here, and omitting languages with only one submission, so do head over to the actual results page for more detail.
While you're there, I also suggest reading Tim's analysis of the results, wherein he argues that some of the code optimizations that "won" the benchmarks should be automatic and nearly transparent to the programmer. He proposes that, in a perfect world, a one-character change to the original Ruby program would be all it takes to enable all the necessary multicore optimizations:
ARGF.each_line* do |line|
I heartily agree. Personally, I think that's the most important result from the Wide Finder Experiment. When it comes to multicore performance, choice of language is no silver bullet. How else can we explain the massive disparity between the fastest and slowest versions of the code in each language?
As experiments go, Wide Finder was a reasonably successful one, if somewhat incomplete and perhaps too small. Tim has addressed both of those criticisms and rebooted with The Wide Finder 2 Project. It's bigger, badder, and brawnier, but the goal remains the same:
The problem is that lots of simple basic data-processing operations, in my case a simple Ruby script, run like crap on modern many-core processors. Since the whole world is heading in the slower/many-core direction, this is an unsatisfactory situation.If you look at the results from last time, it's obvious that there are solutions, but the ones we've seen so far impose an awful complexity cost on the programmer. The holy grail would be something that maximizes ratio of performance increase per core over programmer effort. My view: Anything that requires more than twice as much source code to take advantage of many-core is highly suspect.
Check the Wide Finder 2 Project Wiki for all the key details. The naive Ruby implementation currently takes 25 hours -- yes, hours -- to complete. Some clever programmers have already beaten this result by almost two orders of magnitude, per the results wiki page.
Wide Finder isn't a perfect experiment, but it is a relatively simple, easily understandable summary of the problems facing all of tomorrow's software developers in the coming massively multicore world. Can you do better on the time without exploding either the code size, the code complexity, or the average programmer's head?
First of all, to even consider code size makes me wonder about the goals of the original author. Complexity and readability are critical, but they have virtually no relationship to code size. The only time you would want to consider code size without other variables would be to promote a language.
In fact, if you count all the utilities that the ruby solution uses, it's probably thousands of line of code long. If I write and use a utility, I can make my code much smaller than that ruby solution! Or is there some arbitrary rule that nothing counts if it ships with the language?
I haven't seen anyone approach the real problem--the fact that all the solutions I've seen move text, and most read text in a line at a time.
If you really had to speed it up, there are some definite areas to attack:
1) read the whole thing into a buffer at once. Use a call that makes use of DMA if you have control of that.
2) Run through the buffer without moving code/creating objects for each line.
3) Avoid regular expressions.
You probably jumped straight to "Oh, that's complicated. Lots more code". If you write your code correctly, it will be longer (should be), but it should also be more readable than the ruby code above. The main function would probably be cleaner because the sub-functions you use will be tailored to your requirements, not generic in-language solutions.
4) If you still want to go multi-threaded, calculate the center of the buffer and search for the first carriage return after it; mark that spot. Pass a pointer to the buffer and an index/length of each half to different threads.
If you want to use more than two threads, just recurse.
See, the real problem here is the language. Ruby and all these other "Elegant" languages give you these tricks that make you think you are implementing a great solution because your ruby code looks small, but actually these tricks often lure you into a pretty poor solution that no good programmer would use if he could actually see what he was doing.
I'm not saying optimizing is a good idea--it's a last resort; and if ruby is fast enough for you and you feel it makes you code faster--great...
But I also really agree with the callout in the article saying that it's an illusion. Personally I worked with Ruby for a year and never got the hang of it--I can't read the example given because I forget all the tricks.
I prefer a language with as few surprises, syntactical exceptions and tricks as it can possibly have. Explicit, clear, long (if necessary) code is fine. I can type a lot faster than I can troubleshoot someone elses' "Short" code, and if the code is done right, there is no reason any given method should be significantly longer.
Bill on June 10, 2008 2:31 AMI know this is kind of off-topic but, Jeff, I just read your response to the offer to contribute to Beautiful Code. I really enjoyed it. Especially where you explain how even highly intelligent, gifted programmers can't hope to keep pace with the complexity of modern projects. And if that's true then a programmer like me doesn't have a snowflake's chance in hell!! So I learn as much as I can and follow good practices and strive not for beauty but function. Keep up the good work.
Kenneth on June 10, 2008 2:34 AMIt's funny to read a pragmatic blog about a unpractical language such the "beauty" ruby.
My favorite quote on this subject comes third hand via http://rickyclarkson.blogspot.com/2008/01/in-defence-of-0l-in-scala.html
"Optimising your notation to not confuse people in the first 10 minutes of seeing it but to hinder readability ever after is a really bad mistake."
-David MacIver
Ray Waldin on June 10, 2008 2:45 AMStep 1: Use Java
Step 2: Use Jakarta Oro for the regex using precompiled pattern matchers before the entrance to the main loop. It's ridiculously faster then the default Java regex and precompiling the patterns also is a huge speed increase.
Step 3: Use threads to do the matching... but actually retrieving the line might be complicated if called from multiple threads... if we could abstract (yes gasp) it into a datasource like object which was thread safe this would be ideal (oh no, class explosion!!!)
Step 4: Use a Concurrent HashMap which uses Threadlocal to store the results from each of the threads
Step 5: Profit. Seriously, I've seen Jakarta ORO beat optimized Perl regex, that library kicks ass.
Weighing up what the merits of the above solution, it would probably be more complicated in the threading aspect - it appears that little asterisk thingo in Ruby is something I'm going to have to look at... but apart from that, this code should be pretty damn fast.
David from Oz on June 10, 2008 2:56 AMI don't get it regarding size. Okay it's a contest of sorts, and it's good to have constraints, but I feel it's not a fair indicator. A large standard library gives you much more power, but it's the old 'my program's only 2k, but just download the 22M runtime first'
The solution method that invokes the processing should be short and readable. In class, many moons ago, I had to create a hashtable in C. It certainly wasn't easy, but why should a c implementation be removed just because the guys who wrote perl decided to include hashmaps in the language as first class citizens.
I'm not dissing newer languages that have these constructs built in. I get a hell of a lot more done with C# than I ever could in C, and I can focus more on the problem, not the data structure. It's a win for me, and all I care about at the end of the day is getting work done by writing expressive code and not rewriting a hashtable or linked list for the millionth time. C# isn't perfect by any stretch, but it's fun to work in.
In any event, this sounds like a good candidate for MapReduce (http://labs.google.com/papers/mapreduce.html).
mr maintainable on June 10, 2008 3:02 AMAm I allowed to use SQL?
I'm sure, on a correctly configured SQL Server or Oracle DB, that I can produce results with an "elegant" SELECT FROM GROUP BY (Oracle Analytics anyone?) that will return results a lot quicker than 8 minutes.
Use the right tool for the job. A DB is inherently multi-processor friendly.
Guy on June 10, 2008 3:24 AMP.S. Kev: That fad has been around since the invention of Perl. That's why it's called Perl Golf. :-)
Ah yes, Perl Golf!
http://www.codinghorror.com/blog/archives/001025.html
Jeff Atwood on June 10, 2008 3:45 AMI'm surprised nobody has mentioned the playstation 3 in here. It seems that some hardware-nut thought "hey, if we put on more processors, things should be faster shouldn't it?". Unfortunately current programming theories weren't designed to be "multi-core". Multi-core performace as far as I can see will only be achieved (if at all) in a multi-core language that has no code blocks or streams - at totally new (or maybe unknown to me) programming technique.
Saw that 3d language that they had in the movie "Contact"? yeah something like that.
Somethings benefit from it like complicated stream systems example video but faster what we want - not more.
--
programmers like being able to say “Run this code on each line of this data” and that’s reasonable. So what we want is a really cheap way for them to add "... and I don’t care what order you run ’em in, or if a bunch run at once”.
--
http://www.tbray.org/ongoing/When/200x/2007/11/12/WF-Conclusions
This is sort of what Parallel Linq extensions (mentioned by a few commenters, above) does.
http://msdn.microsoft.com/en-us/magazine/cc163329.aspx
Jeff Atwood on June 10, 2008 4:24 AM@Cedric
The quality and speed of code produce is completely
unrelated to the size of the source code
This is completely correct, when the source code is being *compiled*.
However, interpreters don't produce code, and Ruby is an interpreter.
Mark Allerton on June 10, 2008 4:27 AMPS: I am only talking about Ruby programs here. I am not suggesting for a moment that a compact Ruby program is faster than a semantically identical but more verbose program in another language. Because Ruby is really. freaking. slow.
Mark Allerton on June 10, 2008 4:33 AMNot sure about the Ruby (coz' I don't really speak it but it doesn't look bad to me). The regex however I'd do differently
by changing this:
GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)
into something like
GET /ongoing/when/[\d]{3}x([\d]{4}/[\d]{2}/{[\d]2}[^.]+)
or rather go for [0-9] instead of \d like so:
GET /ongoing/when/[0-9]{3}x([0-9]{4}/[0-9]{2}/{[0-9]2}[^.]+)
So you can read how many digits are required instead of having to count them. (which isn't as bad in a monospace font on account of the forward slashes doing some grouping)
If I didn't make too many typos al regexes should do the same, but I'd go for 3 for being the easiest on my eyes. (personal preference, i know)
I also agree with a previous poster it would be more "scalable" (love the buzzword) to maintain statistics at runtime, on the other hand... that would deprive us all of this little exercise.
Anyhow, thanks for being the gazillionth person to remind me to look into Ruby a bit.
Kris
Kris on June 10, 2008 4:45 AMAnd immediately after posting... I see I did make typos:
GET /ongoing/when/[\d]{3}x([\d]{4}/[\d]{2}/[\d]{2}[^.]+)
or rather go for [0-9] instead of \d like so:
GET /ongoing/when/[0-9]{3}x([0-9]{4}/[0-9]{2}/[0-9]{2}[^.]+)
(there shouldn't be accolades around the class, just directly after it to indicate how many characters of the described class to match)
Kris
Kris on June 10, 2008 4:47 AMA translation for non-rubyists, in comments
// creates a new hash
// this hash will return zero if the key or value does not exist
// for each line in the arguments, with that line do the following
// if the line matches the given regex
// increment the value for the key matching the backreference
// (end the if)
// (end the iteration)
// sort the hash keys in order of descending hash values (the part in the {} is the sorting algorithm)
// for each key from keys_by_count[0] to keys_by_count[9]
// output the key value followed by the key itself
// end the iteration
actually - to clarify one part:
{ |a, b| counts[b] = counts[a] }
this is a comparator, it compares a and b using counts[b] and counts[a] and = returns -1, 0 or +1 based on the comparison
"Even if you have no idea what Ruby is, this simple little program isn't too difficult to decipher."
I hate to completely disagree... but I do. I had to look more than one thing up to work this out. I do not normally have this problem with Java or PHP, or anything else which at least sticks to relatively normal grammar and syntax (i.e. C/Pascal style)
Surrounding things with | and using = ???
I have to say I was put off of Ruby by the learning curve... its not impossible, but I don't feel its worth it.
It is one of two languages to have achieved this for me, along with Haskell... I picked up and ran with Java, JS, C, C++, VB, C#, PHP, Python and numerous other small ones not worth mentioning. I was only ever taught Pascal and Ada...
Not to ignore the rest of the blog. Its a shame that compilers can't solve the parellisation problem yet... but its to be expected. The necessary information just isn't there for the compiler to optimise most of the time. Its difficult to identify a canonical for loop for example...
Personally I feel the immediate future for this is something like OpenMP where the programmer personally marks up what is easy to parallelise to help the compiler out. A compiler that parallelises general procedural code may in fact be impossible... it has not been shown to be impossible afaik, but its not been shown to be possible either...
Jheriko on June 10, 2008 6:26 AMLet me try.
Niyaz PK on June 10, 2008 6:30 AMMeanwhile you could read the article:
Which Programming Language Should I Use?
http://www.diovo.com/?p=49
that should be "using = ???"
seems that the anti-injection code here is a bit fussy about the greater and less thans...
Jeff: can you not just escape all of the characters in the text? there are normally some helpful functions for this sort of thing... or is it to protect against some awesome attack that I am not aware of?
Jheriko on June 10, 2008 6:33 AMgot my lt and gt backwards...
speaking of which, it says here above the comment input box "(no HTML)". But isn't amp;gt; HTML? for instance...
Jheriko on June 10, 2008 6:35 AMI don't know... I think it's very easy to get into Ruby code that makes no sense at all if you aren't familiar with the language.
"each" is simple. "collect", not so much.
Scott on June 10, 2008 6:42 AMGET /ongoing/when/\d{3}x/(\d{4}/\d{2}/\d{2}[^.]+)
i am probably to harsh, its mainly the minutia that get me... i can see the gist of the program, but understanding exactly what it does needs me to look things up.
"keys_by_count[0 .. 9].each do |key| ... end"
looks nothing like anything i see in another program language... and its not unambiguous as to what it could do until i look it up...
some kind of for each loop... /i guess/.
"keys_by_count = counts.keys.sort { |a, b| counts[b] = counts[a] }"
is even more confusion...
what does this do... well i have to guess massively and say that it sorts the keys and stores them in a new array... but the bit inside the {} is completely ambiguous. it does something to do with pairs of values and something which /looks like/ it might be a swapping operation.
It would help if i had more patience... but I prefer to think of it as "every other programming langauge does fine, this one must just suck" rather than "i must suck because this one langauge is slightly more difficult". as arrogant as that position maybe... it seems more reasonable
If Ruby offered something new I would have learned it fine tbh... its just difficult enough to not be able to "pick up and run with" like almost everything else out there... but honestly, it wouldn't let me do anything I can't already do.
Jheriko on June 10, 2008 7:02 AMMaybe I'm being too harsh but this article sums up perfectly why I'll never use Ruby for anything and why I wouldn't recommend using it for anything that is either in production or other programmers need to read.
1) Ruby is very, very, very slow in seemingly every situation. I'll happily accept the performance loss to go from C++ to Java and eliminate memory-management errors but I'm incapable of seeing what benefits would overset the pathetic performance of Ruby. People keep telling me "in a lot of projects speed doesn't matter" but in no serious business do resources never matter.
2) As mentioned above, C/C++/Java/C#/PHP all share very similar syntax and most developers know at least one of the above so I just can't understand why Ruby's vastly different syntax is so massively superior?
Twitter is the worst possible evangelist for Ruby. Their uptime is fairly pathetic for a serious outfit and they'd probably have been quicker to rewrite Twitter in a "serious" language by now rather than desperately trying to work out how to hack RoR into a serious production environment.
I welcome your imminent flames :)
Mike Arthur on June 10, 2008 7:09 AMThat certainly does seem to use several more "Special Characters" than other languages I've worked with. I thought C++ was a little annoying with the goofy - and :: stuff...
if line =~ %r{GET... WTF?
HB on June 10, 2008 7:18 AMThis is a program to count the most common HTTP GET URL entries in a webserver log file
Why not do that directly when the HTTP GET requests are made? Log files are for post-mortem stuff, statistics should be updated immediately.
I'm reminded of the electric bicycle steering handlebar grip-heater (I think I saw this on TheDailyWTF, but I'm not certain). Correct me if I'm wrong.
Someone on a corporate newsgroup was complaining about having cold fingers when he arrived cycling to work. Someone else replied with sympathy and said that maybe it was a good idea to use a little electric circuit to heat the handles. This concept mushroomed in scope and grandiosity until someone sane - after half an hour or so - told the original topicstarter to just wear gloves.
Now -that- is beautiful code.
Rob Janssen on June 10, 2008 7:24 AMI could not help, but i found this C#-Linq solution to be more readable for me. Reading means understanding here, in spite of translating to my mind.
There is to much magic in ruby for my taste, like the $1 - where does it come from? Heaven?
Source:
http://jcheng.wordpress.com/2007/10/02/wide-finder-with-linq/
IEnumerablestring data = new LineReader(args[0]);
// (LineReader is not a built in function)
Regex regex = new Regex(@"GET /ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/([^ ]+) ",
RegexOptions.Compiled | RegexOptions.CultureInvariant);
var result = from line in data
let match = regex.Match(line)
where match.Success
group match by match.Groups[1].Value into grp
orderby grp.Count() descending
select new { Article = grp.Key, Count = grp.Count() };
foreach (var v in result.Take(10))
Console.WriteLine("{0}: {1}", v.Article, v.Count);
"Tim calls Ruby "the most readable of languages"; I think that's a bit of a stretch, but I'm probably the wrong person to ask,"
I have a reasonable smattering of Ruby under my belt but I find Ruby no more readable than any other language that I have familiarity with. It all comes down to how the programmer expresses his or her intent. I've seen super readable C#, ASP, VB, Perl, Python but also conversely, code that's an utter mess, the same goes with Ruby. Ruby is nothing special. It's just another language in a sea of languages. Yes it may have some cool features to make the expression of a programmers intent more concise, but abuse/misuse of these features can make even just a few lines of code look sociopathic. You only have to look as far as ternary operators in C++ or C# to see where that can lead.
There's also a current fad just now with new Ruby afficionados, converts and zealots to see how much they can compress their intent into as few lines as possible using every trick in the book. I find this obfuscates Ruby code as much as it does C#, Perl or any other language that lets you pull of these tricks. You just end up with another blob of 'write-only' code.
Just my 2c from current observations.
Kev on June 10, 2008 7:29 AMYou won't get any flames here. I haven't yet figured out why any programmer would like Ruby. I understand why the web designers like it, but that's because they are... well.. designers, not programmers.
Designers like pretty things, programmers like things that work. It's the fundamental difference of the job description.
Jeff Davis on June 10, 2008 7:35 AMWell, here's the thing. Ruby still has a naive interpreter, and as optimized as you can get such things, they are still deadly slow vs the VM interpreters. Python, Java, and the .Net languages are all compiled to bytecode, which is then run on a VM. You can optimize VM's to be hideously fast, and achieve huge performance gains there.
So no matter how clever you can get with the code, until Ruby moves to a VM implementation, it will still be the last of the pack. It also makes it particularly difficult to efficiently instantiate and manage threads in a naive interpreter, vs a VM.
I don't know how Python achieve such speed gains, since its still single-threaded, unless you use stackless... *checks the link*
Zeroth on June 10, 2008 7:37 AMin a perfect world, a one-character change to the original Ruby program would be all it takes to enable all the necessary multicore optimizations.
In a perfect world, the compiler would handle the multicore optimizations _without_ a code change.
Craig on June 10, 2008 7:39 AMI guess without OS or language support, there won't be any progress in thies field.
I only know python and I use it as an enhancement over shell script, but I guess the map function for example would be the perfect place for the interpreter to automagically spawn some thread...
bandini on June 10, 2008 7:43 AMWell Jeff, once again you've shown that it is a bad idea to have any sort of digression in your blog posts because the comments will inevitably be mostly about the digression and not the main point of the post. grumble, grumble
PS. Yeah, I found parts of the ruby code example to be impenetrable as well. Shocking!
Disclaimer: Ruby neophyte here.
I'll have to side with Jheriko on this one. That snippet of code is not the clearest of them all. I had to scrutinize the code for a few minutes before I could understand its purpose. Personally, I think that's the last thing you want as a developer. Aren't we expected to strive for clarity?
Elvis Montero on June 10, 2008 7:48 AMI saw some interesting thoughts on multi-threading presented by a fancy researcher at the launch of Microsoft's 2008 products launch.
At the time there seemed to be some great ideas there for the future - but I can't remember what he said now :)
Peter Bridger on June 10, 2008 7:56 AMI don't think core parallelism will help you much on this problem, actually. That's why it runs like a dog on almost any language. The problem is, at its core, one of disk and memory caching and of paging. If your processing - whatever it is - can't beat the disk in retrieving the next block, you have a problem. It doesn't help to parallelize the CPU-bound parts if your problem is overall I/O-bound. A 'warm' run will likely still have the file in the disk cache, so you have to be careful to do a complete flush before testing. If you don't the numbers are meaningless. You'd also do well to ensure the file is defragmented and contiguous - disks are better at sequential access than at random access - and ensure that there won't be other random I/O on that disk.
I expect the difficulty largely to be in computing the hash and sizing the hash table, particularly if you, or your environment, try to resize the hash table while it's running, causing all the hashes to be recalculated. Imperative languages tend to give you more control, at the cost of having to be more explicit about the algorithm. As the hashtable gets big the OS will start paging your HT out anyway.
If there is scope for parallelism, you have to be very careful to avoid causing locks on shared data structures. For this problem you, or your environment, would be better placed to accumulate results on each thread and combine them together when each part is completed so they're not touching a shared data structure. An interlocked operation ('lock-free' programming) is not free by any means, it will stall the core for many cycles as it has to go and hit main memory directly while asserting exclusive control to that address across all processors.
You and Tim would probably do well to watch Herb Sutter's presentation to the Northwest C++ Users' Group "Machine Architecture: Things Your Programming Language Never Told You", which you can find at http://www.nwcpp.org/Meetings/2007/09.html. (Scroll down for the Google Video recording of the presentation and the PDF of the slides.)
Mike Dimmick on June 10, 2008 7:57 AM@Aaron G:
You'd need maybe 20 extra lines, if that, to create some worker threads or dip into the thread pool in C#, and synchronize access to the hash table.
Your program would be blocked on the hash table most of the time. Look up "lock convoy".
Mike Dimmick on June 10, 2008 7:59 AMI just wonder why there are no submissions in C or similar languages.
I guess, I will give SNet a try sometime for that.
SNet (www.snet-home.org) has the goal to make programming for this enviroments easier. It creates boxes that communicate via streams, a box is implemented in any language with a language binding (granted, currently, only C and SAC are supported languages).
In the next step, you connect the boxes. With some assumptions (a box has a single input and a single output) you do not use wiremappins (that is, Output 1 from Box 1 to input 11 of Box 42, Output 98 of box 72 to input 55 of box 23, ...), but rather nice clean statements like: A..B, that is, the output of A is being piped into box B, or things like A*(termination condition) - A star - , that is, feed the output of A into A again until the termination condition becomes true.
Given that, you can do such things fairly nice.
At first, you have a bottleneck, because the machine has just one IO-device, thus, you create a box that reads data and pumps them into the network. (Or rather, the runtime system will do this automagically, heh),
After that, you just implement a box B that contains an encoding of a single step for an automata that parses the regex up there and marks a certain exit condition and possible outputs after finishing. This box is put into a star, that is, B* and bang, you are done.
The only speed caveat currently is the runtime system, as it is not implemented for multiple processes yet, currently it only works for multiple threads in a single process.
However, the nice thing about SNet is that you just need little more work to get that stuff parallel. Granted, it is more work than just adding a tiny star somewhere, but once you have this done, and our research continues well, you will be able to scale as mad as you want.
Let me rephrase that.
If our SNet-Runtime system works properly someday, you can create a software that runs on a shared memory machine with ... 5 threads, but you can scale it up to run on millions of computers all over the internet just like Seti@Home did, without touching your production code (after a little bit of reorganization)!
And do not fear. The changing of production code mostly is splitting modules apart, which should be easy with enforced, solid APIs.
/This/ is what I call beauty, as I can boldly answer YES to your question, especially because this can stomp other parallelization methods into the ground for more complicated things.
Greetings, Hk
Hk on June 10, 2008 8:04 AMWow, lots of fairly closed minds here!
I think that any halfway decent developer should be interested in moving outside their C syntax comfort zone and learning other languages. Ruby borrows elements from Smalltalk (e.g. the |goalposts| enclosing local variable declarations) and Perl (e.g. the $1 in regular expressions). Incidentally, Ruby does have a more verbose OO syntax for doing regex if you prefer.
The thing with Ruby is that the language is so flexible and dynamic that you can write cryptic code if you must, or you can write code that reads like an English sentence. Of course it all depends on what you're used to. My initial reaction to the C# example given earlier was "WTF does CultureInvariant mean"?!
Regarding performance, has everyone forgotten that Java was hideously slow upon release? There are various promising initiatives underway to create a Ruby VM, so expect the performance gap to narrow or disappear altogether. You can even now run Ruby on a JVM using JRuby and access Java classes if you really want to.
John Topley on June 10, 2008 8:05 AMThought Ruby was prettier than that.
Have to agree with Craig -- a proper compiler would handle multicore optimizations.
Silly wabbit! The most beautiful code is APL!
http://en.wikipedia.org/wiki/APL_(programming_language)#Examples
Steve on June 10, 2008 8:09 AMI have to chime in with the others. That is not a very readable program. If I knew Ruby I might know what the pipe characters do, but otherwise they are not self-evident.
Of course, I could just be a VB dinosaur.
Steve Boyko on June 10, 2008 8:14 AMThe code listed is probably not a good example of readability. I doubt as many people would have problems with it if a few simple things were changed such as:
regex = Regexp.new('GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)')
ARGF.each_line do |line|
if regex.match(line)
counts[regex.last_match(1)] += 1
end
end
... and the = operator replaced by the more familiar ternary or a simple if/else. There are too many esoteric Ruby shortcuts for this to be considered 'readable' - but that doesn't mean it's a problem with Ruby.
I'm no student of programming languages, but I think the other confusing bit of code, the use of blocks, (things between || and the associated do/end or {} code), is a great idea and I would love to see it become more widely used.
The "fad" of cramming as much as possible into as little code as one can has been around a lot longer than Perl golf. As I understand it, the C programmers started it out of necessity, back when memory was truly limited...
a href="http://www.ioccc.org/"http://www.ioccc.org//a
Quoting from posts above:
There's also a current fad just now with new Ruby afficionados, converts and zealots to see how much they can compress their intent into as few lines as possible using every trick in the book. I find this obfuscates Ruby code as much as it does C#, Perl or any other language that lets you pull of these tricks. You just end up with another blob of 'write-only' code.
Just my 2c from current observations.
P.S. Kev: That fad has been around since the invention of Perl. That's why it's called Perl Golf. :-)
Aaron G on June 10, 2008 06:31 AM
Kev on June 10, 2008 06:29 AM
Ruby the most beautiful language? Hah!
In terms of actual visual appeal Erlang is quite nice; pleasingly spikey and compact. I'd love to say that Common Lisp was beautiful, but function/macro names like 'cadadr' and 'destructuring-bind' ruin it a little. :)
A lot of the more exotic languages in this test, btw, are suffering from not having very good bulk IO.
Robert Synnott on June 10, 2008 8:22 AMHaven't you tried C# or VBScript on Excel?
Name on June 10, 2008 8:30 AMWhat? No graphical programmers out there? I did the assembly language, BASIC, C++ ... but found the love of my life coding in LabVIEW. Talk about beautiful.
I get some grief from a few "real" "seasoned" programmers who think we LabVIEW guys just sit around and "draw pictures" all day. :)
PaulG. on June 10, 2008 8:33 AMI followed TBray's post on this topic last year. As previous commenters pointed out, WideFinder is more IO bound than CPU-Bound. That's why a few implementations there resorted to low-level tricks such as mmap to obtain decent numbers. Particularly, Ruby, being weak at IO, sucked a lot.
James Kuoski on June 10, 2008 8:38 AM"naive Ruby implementation" - heh. Funny, but I think it should be native.
Hemisphire on June 10, 2008 8:39 AMMy initial reaction to the C# example given earlier was "WTF does CultureInvariant mean"?!
Oh, this is a great feature: Ever searched rtl-Text?
Do the ruby-libs handle different cultures without problems? And does ruby handle the full unicode characterset? Or UTF-8 AND Ansi?
Covers it problems such as wether and i are seen as equivalent?
Ok, ruby is only the language itself, but libs are just as well important.
If you wanna know, which params a method or class needs: Use Intellisense or press F1 :-)
titrat on June 10, 2008 8:42 AMI have to agree with Mike. If this is actually CPU bound, Gawd help you.
And I think the C# (with some formatting of course) is actually far more readable.
Sjoerd Verweij on June 10, 2008 8:47 AMtrying to compress your code into fewer and fewer lines / characters has been around longer then programming languages. We used to do it with assembler and machine code. It was a necessary evil when computing power was measure in bytes (not MB or GB, but byte) and kilohertz. Anyone who does it today should be forced to write all of their code in APL, which was unreadable any way.
Jim C on June 10, 2008 9:16 AMI love how people complain about .collect, it's just that Ruby has some different names for things. Personally I think they're often better. But you can use the 'traditional' names if you like:
irb(main):003:0 [1,2,3].map { |i| i**2 }
= [1, 4, 9]
irb(main):004:0 [1,2,3].collect { |i| i**2 }
= [1, 4, 9]
Similar idea for inject/reduce, extend/concat, detect/find, etc.
dude on June 10, 2008 9:17 AMRuby's syntax can be a little obtuse at times. The biggest reason people don't "get" the benefits of ruby is that they haven't explored the beauty of metaprogramming. If you think laterally about problems, ruby and dynamic languages let you do some amazing things that will be lost on you if you aren't familiar with metaprogramming or thinking out of the box.
When used correctly, it's something like "I need to jump REALLY high in this class" and you realize you can turn off gravity locally, in the class. On the flip side, you can also be dumb about it, and turn off gravity globally... With great power comes great responsibility.
Eric Beland on June 10, 2008 9:21 AMNew version of the Paralell Extensions for .Net came out last week
I struggle to understand why people wouldn't like this ruby code. Maybe, and this is just a guess, they like their intermediate values more explicitly stored?
I mean, Ruby encourages you to keep calling methods on return values until you're blue in the face:
hash.keys[a..b].collect { ...code... }.foo.bar.etc
There are quite a few intermediate values in the above pseudocode that never get a variable name. They're implicit. Does this bother people? I know that Python would never stand for such a thing. You can get a couple of levels into it but pretty soon you'll run into a "procedure" (I call them that because they remind me of Pascal) and be forced to store a result in another variable before proceeding.
This code creates a new 10-element array and calls the "each" method on that. Is this bothersome?
keys_by_count[0 .. 9].each do |key|
puts "#{counts[key]}: #{key}"
end
For clarification, here's some equivalent C code:
for (i = 0; i 10; i++) {
key = keys_by_count[i];
printf("%s: %d\n", strhash_get(counts, key), key);
}
P.S. even in this Ruby program, the programmer stopped and regrouped by using the "keys_by_count" variable. It isn't necessary, but it does clarify the intent of the program. Maybe some commenters would have preferred even more of that sort of thing.
Josh on June 10, 2008 10:02 AMSome units would be nice on those results. Seconds? Minutes? And the format changes on the longer ones, making it harder to grok the difference in time.
Also some of the PRE elements in your HTML aren't large enough to show anything but the scroll bars, making it impossible to read one-liners, like the format of Tim Bray's URLs. Well, not impossible, there's always View Source.
Adrian on June 10, 2008 10:03 AMDid anyone actually go and look at the results page? No?
Mixed in the bottom half of the chart are implementations in Perl, PHP, Gawk, OCaml, Erlang and Ruby. In the top half are Perl, Python, Erlang, and JoCaml.
This proves that a programmer can write fast or slow code in any language. Not exactly earth shaking.
It is a nice opportunity to bash Ruby and Ruby developers, those smug bastards!
Here is the point I got from this post, maybe someone would like to comment on it: Multicore is the future. The silicon vendors will keep producing mutlicore chips, run optimized benchmarks and say the performance is amazing. The reality, when running real code on your system, will suck.
What are we, the programmers of the world going to do about it?
Argue about the prettiest language I guess.
Doug on June 10, 2008 10:06 AMCan you do better on the time without exploding either the code size, the code complexity, or the average programmer's head?
a href="http://blogs.msdn.com/pfxteam/archive/2008/06/04/8573863.aspx"http://blogs.msdn.com/pfxteam/archive/2008/06/04/8573863.aspx/a
See:
for(int y=0; y to
Parallel.For(0, screenHeight, y = /* funcBody */ )
Btw. on the issue of threading there's some interesting stuff today at http://www.gnome.org/~michael/blog/2008-06-10.html , and linked from there: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf - "The Problem with Threads".
(Though, Meeks' conclusion which mentions helgrind as a helper is somewhat opposite to the PDF, which rather wants to "discard threads as a programming model"... And while I think helgrind is a nice tool for fixing those damn multithreaded apps, I prefer to completely steer clear of threading if possible)
oliver on June 10, 2008 10:18 AMRegarding language support for multi-core and silver bullets, I wonder how good Apple's solution with Grand Central in Snow Leopard will be. Unfortunately, it is NDA right now, but when it is out there in public (and open source?), that could be an interesting option.
"Grand Central, a new set of technologies built into Snow Leopard, brings unrivaled support for multicore systems to Mac OS X. More cores, not faster clock speeds, drive performance increases in today’s processors. Grand Central takes full advantage by making all of Mac OS X multicore aware and optimizing it for allocating tasks across multiple cores and processors. Grand Central also makes it much easier for developers to create programs that squeeze every last drop of power from multicore systems."
So basically, support in the language and some dynamic decisions about allocation of threads.
see http://www.apple.com/pr/library/2008/06/09snowleopard.html and http://www.apple.com/macosx/snowleopard/
charles on June 10, 2008 10:22 AMRegarding Tim Bray's claim that Ruby is the most readable, I have to disagree as well. C'mon.. 'puts'? 'chomp'? Its 'unreadability' is one of the reasons I never went far with it.
I'm tempted to say that Python is the most readable of all.
astigmatik on June 10, 2008 10:44 AMThis seems to be much ado about nothing.
Implementing with even the most simplistic OpenMP directives has a large impact on performance. Without a significant increase in source code.
Why in God's name would you attempt performance critical code in an interpreted language anyway?
Codewiz51 on June 10, 2008 11:33 AMI don't know Ruby, so I found it very difficult to read the first time round. I'm guessing though that if I did know Ruby it wouldn't be so hard - indeed after reading the comments and explanations I can follow it okay.
Personally one thing about Ruby I'm not too keen on is making things try to read as close to English as possible, because more often than not it seems to mask a lot of the details of what's going on. What may be quicker to understand at a conceptual level at a glance becomes very difficult to debug when it's ever so slightly wrong.
There are lot's of things I like about Ruby, but the Ruby style and the Ruby culture doesn't float my boat, which is a shame.
[ICR] on June 10, 2008 11:56 AM"Their uptime is fairly pathetic for a serious outfit and they'd probably have been quicker to rewrite Twitter in a "serious" language by now rather than desperately trying to work out how to hack RoR into a serious production environment."
I think Twitters problems come more from a very poor architecture rather than the language. http://dev.twitter.com/2008/05/twittering-about-architecture.html
[ICR] on June 10, 2008 11:58 AM"While you're there, I also suggest reading Tim's analysis of the results..."
Tim's analysis is a wikipedia page? I think you got the wrong URL there...
Marcel Popescu on June 10, 2008 12:20 PMA couple of other people mentioned Microsoft Parallel Extensions. Check out Allen Bauer's blog, he digs into it.
http://blogs.codegear.com/abauer/2008/02/22/38857
This from a recovering Delphi fan:
WHERE ARE THE COMMENTS?
line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
One simple comment would avoid the need to load my mental RE parser. Once I have figured out what it does, then I need to guess why it's there.
Clever code is more fun to write than it is to maintain.
Cheers
Richard Haven on June 10, 2008 12:50 PM@Jon Raynor
Less lines of code doesn't necssarily equal beautiful code.
Additionally, machines (compilier's) don't care about beauty, they just run it.
This is where Ruby (at least using Matz Ruby Interpreter) breaks all the rules. Because it is a primitive AST-walking interpreter, the smaller a program is syntactically, the faster it is likely to run (at least compared to other semantically similar programs.)
I suspect this fact has a lot to do with Rubyists' obsession with with syntactic compression...
Mark Allerton on June 10, 2008 1:16 PMSorry Mark, but this is patently false. The quality and speed of code produce is completely unrelated to the size of the source code, as anyone who's ever written a "Hello world" in any language can show.
As it turns out, Ruby is one of the slowest dynamically typed languages on the map today.
Cedric on June 10, 2008 1:32 PMThis part sums up the unreadability of it to me:
keys_by_count[0 .. 9].each do |key|
puts "#{counts[key]}: #{key}"
end
Obviously this is some sort of foreach loop. But does it increment key? or what? 0 .. 9? Two dots? I mean, I'm sure it DOES make sense, but it doesn't look like any improvement over a simple foreach construction, as even the much maligned PHP can do.
But anyway, I think the overall point is totally spot on. The people who rant about elegance, beauty, the way it "should" be done are often far, far removed from the trenches. A good programmer knows not to get obsessed with elegance and beauty, to keep a strong sense of realism at all times.
Shmork on June 10, 2008 1:34 PMThe syntax is very readable except for the overuse of non-obvious chars |#~%
these don't say anything to a non-Ruby programmer... i.e. have to be learned...
This is why APL died...and why many people hate regex's
You have an editor with auto-completion, the compiler does not care about verbose syntax, the program will be exactly the same size and run the same speed ... so why are you using lots of silly chars ....?
Jaster on June 10, 2008 1:39 PM"I think Twitters problems come more from a very poor architecture rather than the language."
A big part of the problem is the framework they chose: Rails. And, yes, I CAN place the blame there, even if the architecture is ultimately the problem. I can do that because the Rails framework encourages a certain DB architecture - one that is inherently unscalable - and discourages more innovative and scalable architecutres (sharding, for example).
I don't understand the mindset of the Ruby / Rails crowd (and yes, I group them together, because it was the advent of Rails that has caused the recent explosion in the popularity of Ruby). It's really something of a fanboy cult. And if you have any criticism to offer? Well: http://www.robbyonrails.com/articles/2006/04/13/canada-on-rails-day-1-part-1
After spending three months working full-time in Ruby, I can say I don't get it. I did not find it at all intuitive or easy to use. In fact, quite the opposite. I realize that you spend a lot of time in the manual when learning a new language, but it was ridiculous the amount of time I spent looking up how to do things - things that I would have expected to be obvious.
Ruby MIGHT be taken as a serious language, but only if the current attitudes that are associated with it are set aside. It is one tool among many, and may or may not be the best choice. But so far, I haven't seen the case where Ruby is clearly the better choice.
Tony on June 10, 2008 1:39 PM"what does this do... well i have to guess massively and say that it sorts the keys and stores them in a new array... but the bit inside the {} is completely ambiguous. it does something to do with pairs of values and something which /looks like/ it might be a swapping operation."
You're stumbling over the syntax for Ruby blocks. Yes, other languages don't do them and it's a shame because they're awesome. It takes about 10 minutes to get the hang of them with a good explanation.
This might not be a good explanation.
First of all, {} is the same as begin/end when it comes to blocks. People use {} for one-liners, begin/end for multi-liners.
Here's an example of:
for(int i=0; i=10; i++) {
do_something(i);
}
is the same as
(0 .. 10).each begin |i|
do_something(i)
end
is the same as
(0 .. 10).each { |i| do_something(i) }
So what are we doing? We're using a 'range' to build an array from 0 to 10. Then we're using 'each' to execute an arbitrary block of code on an array.
Because it's an array, our block will get one argument. If it was a hash our block will get two args.
|i| names the argument to the block i
But that's not so useful, right? It's just a loop with different syntax.
Here's why it's cool: you can use blocks with any code.
def my_function
print "Hello "
yield
print " World"
end
and calling it with:
my_function() do
print " Goodbye "
end
would give you
"Hello Goodbye World"
oh, and Ruby is a lot easier to read if you're coming from a perl background.
Stuff like the regular expression syntax and the sorting syntax isn't that strange if you come from a perlish background.
engtech on June 10, 2008 1:52 PMIf I was writing that program I'd avoid the sort routine and just keep hold of the top ten most frequent strings.
Wayne on June 11, 2008 3:11 AMThe speed Ruby gives is nothing to do with its execution time. That said, Ruby 1.9 is coming (not for Rails yet though, big change) and JRuby is pretty quick too - it's being worked on.
I can do stuff in a quarter of the time (or less) it used to take me in Java, with a tenth of the code. It is easier to maintain because of the use of blocks, which allow templating:
1. set up database connection
2. execute query, or do something do the database
3. tear down the connection
and handle any exceptions gracefully
With blocks you just pass in 2, and are not at home to the cut and paste monster eating your tear down code, you can do just about anything in the block, and the exception handling can be halfway sane. This applies to all of these situations, reading and writing to files etc. etc. You can do this in Java but it is very hard, because the language doesn't like you, the programmer, it doesn't trust you to know what you are doing (why is String final? Because it doesn't trust you - I could go on).
I think the example is poor, because it reads like something you'd type into the interactive console when you were trying things out. If it needs comments it's been written wrong - c.f. Fowler's "Refactoring" book where there is a pattern of replacing complicated comment with a well-named, and factored, method.
There's the whole metaprogramming and code generation thing, which is really easy in Ruby - see http://s3.amazonaws.com/giles/scissors_041108/scissors.pdf (pdf!!)
Have a poke around in the Scotland on Rails website for more interesting stuff...
http://scotlandonrails.com/talks
Somewhere in the blogosphere there is also an interesting article where some code is actually refactored to have more lines, to make it maintainable. Less is more, but more is better when you can understand it. This is on the Rails Envy podcast site somewhere.
Francis Fish on June 11, 2008 3:34 AMAs a side note, myself and others seem to perform this task for often I'm surprised I don't see more implementations of FrequencyDictionary ADTs.
[ICR] on June 11, 2008 4:18 AMJust on the odd characters Ruby uses: a friend sent me a Java code fragment in which he looped through printing "Thank You!" a million times (it was a response to a professor who had extended the deadline on a paper). I responded with a single line of Ruby to do the same, and a single line of Lisp.
He wrote back:
underscores, pipes, octothorpes, curly braces -- sheesh...
I'll take a mild dose of verbosity if means I don't have
to code something that looks like it's been zipped already
What, nobody wanted to take a crack at a clearer version for non-Ruby programmers? Try this:
------------------------
counts = Hash.new
counts.default = 0
path_regex = Regexp.new("GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) ")
ARGF.each_line do |log_line|
article = path_regex.match(log_line)
if not article.nil?
counts[article.captures.first] += 1
end
end
keys_by_count = counts.keys.sort_by { |key| counts[key] }.reverse
keys_by_count.first(10).each do |key|
print counts[key].to_s + ": " + key + "\n"
end
------------------------
Two tricky parts. First, the .match method returns a MatchData object on a hit, or nil (Ruby's empty, null value) if it doesn't match. Any values captured by the regex are stored in the object's .captures array. I'd chain it all together if I knew the pattern would match (say, if I'd used ARGF.grep(path_regex).each; but that would read the whole log into memory first).
The second is that .to_s is just a type conversion to string so that you can concatenate (+) a number with a string. There are a bunch of .to_* methods in Ruby.
The rest is pretty simple. If I were writing the script, my code would end up somewhere in between the original and my revision in terms of readability. The original uses a whole bunch of Ruby-specific syntax and tricks. Goes to show you can write opaque code in any language.
ruby_sympathizer on June 11, 2008 10:22 AMI love have people keep talking about blocks, like its just a ruby thing, it sounds a lot like c# anonymous methods.
"
1. set up database connection
2. execute query, or do something do the database
3. tear down the connection
With blocks you just pass in 2, and are not at home to the cut and paste monster eating your tear down code, you can do just about anything in the block, and the exception handling can be halfway sane. This applies to all of these situations, reading and writing to files etc. etc. You can do this in Java but it is very hard, because the language doesn't like you, the programmer, it doesn't trust you to know what you are doing (why is String final? Because it doesn't trust you - I could go on)."
I can do that in C# too
SomeReturn ExecuteSqlBlock(SomeDelegate moo)
{
SqlConnecton dbcon = OpenDatabaseConnection(); //open database
moo(dbCon); // execute the past delegate
dbCon.Close(); // The Tear
dbCon.Dispose(); // Down
}
delgate void SomeDelegate(SqlConnection conn);
void Main()
{
ExecuteSqlBlock(new delegate(SqlConnection con) {
con.DoSomethingWithIt()
}
);
}
So I have to define a delegate mind, but it still works, it has access to local variables of the function too ^_^
Though personally I'd just pass a SqlCommand object to the function rather than having a function that takes a delegate. But I was just making a point.
Nik Radford on June 12, 2008 2:24 AMJust to highlight a possible way to speed everything up: approximate the data set by reading selected chunks and skipping the rest. On the assumption that the selected lines approximate the whole well enough that the top ten come out right, it should ease the I/O limitations of most methods.
More details on my blog.
Fun.
Phil H on June 12, 2008 4:21 AM@Mike Dimmick: Let each thread maintain its own hash, and merge the results at the end.
@rcphq: I have a distaste for Ruby because I feel like it is taking mind-share from Python, which is IMO a prettier language, certainly faster, and every bit as powerful. I try not to be a reflexive hater since I haven't actually used Ruby, but it seems like it's going to get adopted by a lot of trend-followers who would be better served to use Python instead, which will eventually make me cry when I have to maintain their code.
Hamilton-Lovecraft on June 12, 2008 6:44 AMjust to point out you seem to have rattled a nest of comments. funny most of them seem to be ruby haters. it's gone political!
i wonder what would happen if you write a similar article using another language as sample?
@Hamilton-Lovecraft: From what I understand, if you start with Python, there's not much reason to switch to Ruby, and vice versa. They're similar. I'd like to try Python at some point, but I have a hard time finding a reason. Maybe I should leap into a pyweek competition sometime and see what it's all about. I'll have to overlook the "spacing matters" thing; reminds me of COBOL. =)
As for the benchmark results in this post, I'm guessing the author was using the 1.8 interpreter since he doesn't specify. There are a number of interpreter implementations. 1.9 (still in development) is looking pretty nice by comparison:
http://antoniocangiano.com/2007/12/03/the-great-ruby-shootout/
ruby_sympathizer on June 13, 2008 7:26 AMHey Jeff!
"Tim calls Ruby "the most readable of languages"; "...
What would you say is the most (human) readable langage?
Objectivity is hard here, since most of us would think that either the first language we were most confortable, or more proficient in would be the most readable one.
Before i got 10 people php programmer
1.$x:28 {$x:34{$x:32} } Tenanry Consept.My point don't overuse tenanry because you knew better then me.
2. function x {
return x;
}
Like c more cleaner
3. function x
{
return x;
}
it is clean ?
4.mvc method
class m {
function _tostring() { return i }
}
Everything just one variable need to create class.This is the dumbest to me.
Conslusion.
I deal with customer which sometimes smarter then me and idiot also.Create a clean code mean must have a standart code and no tenanry please.If a lot do in store proc or make it classes or namespace to maintain it .That to me it's clean
A note regarding the OCaml and JoCaml entries: the OCaml entry that runs in
14s is a single-core implementation I submitted for comparison purposes (so
the speedup attributable to the T5120's 8 cores can be measured); it's the
fastest non-multicore one.
JoCaml is an extension of OCaml with the join calculus which allows for easy
concurrent and distributed programming. Tim Bray didn't measure the latest
version of the JoCaml entry, and neither were the optimum parameters (32
workers) used. The last version would run in close to (or slightly below) one
second.
There isn't a large difference between the different languages because the
task was trivial and exercised the regular expression engine mainly (this is
also why Perl has got a head start, since the code for a Boyer-Moore string
search lib is included in the line count for OCaml/JoCaml and other languages
including Erlang).
Wide Finder 2 requires more processing and the differences in speed are more
apparent there.
Have you seen this page:
http://shootout.alioth.debian.org/gp4/
It lists various benchmarks against lots of languages!
Here is the "read lines, parse, and sum integers" benchmark.
http://shootout.alioth.debian.org/gp4/benchmark.php?test=sumcollang=all
Ruby is position #28 vs Python pos #7.6 vs Perl pos #6.6. Ruby processes all lines in 76 seconds, while Python and Perl were about 20 seconds. And Python can kill other scripting languages because you can re-write slow Ptyhon modules in C/C++.
BTW, dynamic scripting languages are 100-300x slower than executing regular code.
Surprisingly, Java isn't that slow because it contains a JIT compiler...
Personally, I think that code snippet is particularly ugly.
Beauty is indeed in the eye of the beholder.
Reminds me way too much of KR coding style that is optimised to print on less pages, not to make it more readable - or even pretty.
IMHO
John Rutter on June 21, 2008 4:30 AMhttp://sotovii.net #1055;#1086;#1088;#1090;#1072;#1083; #1084;#1086;#1073;#1080;#1083;#1100;#1085;#1099;#1093; #1088;#1072;#1079;#1074;#1083;#1077;#1095;#1077;#1085;#1080;#1081;
Sotovii on September 11, 2008 1:58 PM"Can you do better on the time without exploding either the code size, the code complexity, or the average programmer's head?"
Take a look at:
(see section on occam-#960;): http://en.wikipedia.org/wiki/Occam_programming_language
For the compiler check out Kroc (its linked in there somewhere)
Leon Sodhi on February 6, 2010 10:25 PMI don't know why people have such a hard time with threaded/parallel programming. No question this can get insanely complicated if all the threads are doing different work and need to synchronize with each other at different stages, but for a dinky little log file analyzer like this? You'd need maybe 20 extra lines, if that, to create some worker threads or dip into the thread pool in C#, and synchronize access to the hash table.
I've gotta hand it to the Pythonists though, what started out as essentially a Comp Sci toy has actually proven itself a worthy contender in terms of both performance and maintainability.
Aaron G on February 6, 2010 10:25 PMP.S. Kev: That fad has been around since the invention of Perl. That's why it's called Perl Golf. :-)
Aaron G on February 6, 2010 10:25 PM# hmm… hngh… hmm… you said do key…
keys_by_count[0 .. 9].each do |key|
Reminds me of my first job - all of our keyboards had a key labeled "Do" (it behaved like the "Enter" key) so our application documentation was full of instructions telling users to "Press the Do Key."
That still makes me laugh.
Non sequitur on February 6, 2010 10:25 PMI have to agree with Jim S and Jim C on the matter of beauty.
Beauty does seem to be associated with code compression:
if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
counts[$1] += 1
This looks like a regex, which for me has always been unreadable. I always have to go to reference to find out what it is doing because I don't use regex everyday.
I've always found that straight forward, well commented code that has proper exception handling was always the easiest to read and more importantly maintain. Usually this code is verbose and not pretty or short.
Less lines of code doesn't necssarily equal beautiful code. Additionally, machines (compilier's) don't care about beauty, they just run it.
Ruby compiler: "this code is so beautiful, I'll run it faster!"
I think code beauty is totally subjective from the coder viewing the code, so it more of an opinion of the code, not a fact. Maybe the above code is beautiful to some coders, to others not so much.
Jon Raynor on February 6, 2010 10:25 PMMike Dimmick: 'Your program would be blocked on the hash table most of the time. Look up "lock convoy".'
I'm not sure that's true. Seems to me like a regex would be a far longer operation than computation of a simple string hash. Especially a regex like that one. You can avoid resizing the hashtable by starting off with a large one, and you can mitigate a major part of the disk latency with simple buffering.
Of course you may still be right about disk being the bottleneck - if 39 out of 41 seconds are spent just reading the file then no parallel execution is going to make much of a dent. However, if it's even, say, an 80/20 split on I/O and CPU, you could eliminate almost the whole 20% by simply using a worker thread to do the analysis while you do your disk reads in the main thread. On a 1-minute operation, that's not a trivial improvement.
Now that I think about it, the absence of an I/O cost makes the "research" seem pretty sloppy. All that's required is to write a simple program that does nothing but read large chunks from the same file and throw them away, and see what the running time of that is. That would be a lot more illuminating to me than most of the information he does give us (extrapolating from the Bonnie results would be mere speculation), and would provide a baseline independent of any fragmentation or caching issues.
Aaron G on February 6, 2010 10:25 PM@Mark A
I see your point. A program with 10K lines of code will run slower than 2K lines (if ruby).
This reminds me of old VB code when the code was interpeted. I beleive they called it p-code or psuedo-code. Because of this, early versions of VB where slow. With version 6 of VB I beleive they went to a native compiler which sped up the performance of the code considerably.
I know nothing about Ruby, but in compiled code world, I don't think size of code is as important. I certainly haven't run across any case that if I made the code "smaller" it ran faster by an order of magnitude.
Look at the results, 50 is the fastest time! It's terrible as compared with other languagees. So, looking at this, Rubyists want to compete with other languages so they was automatically process many lines keep all the "threads" that are processing the lines in sync with each other and so forth.
If your going to all these complex lengths to improve performance, why not choose a different language that is faster to begin with?
Jon Raynor on February 6, 2010 10:25 PMThis is only a preview. Your comment has not yet been posted.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |
Posted by: |