I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

October 30, 2007

Embracing Languages Inside Languages

Martin Fowler loosely defines a fluent interface thusly: "The more the use of the API has that language like flow, the more fluent it is." If you detect a whiff of skepticism here, you're right: I've never seen this work. Computer languages aren't human languages.

Let's look at a concrete example from Joshua Flanagan. Here's how we define a regular expression in the standard way:

<div\s*class="game"\s*id="(?<gameID>\d+)-game"(?<content>.*?)
<!--gameStatus\s*=\s*(?<gameState>\d+)-->

Here's how we'd define that same regular expression in Joshua's fluent interface.

Pattern findGamesPattern = Pattern.With.Literal(@"<div")
    .WhiteSpace.Repeat.ZeroOrMore
    .Literal(@"class=""game""").WhiteSpace.Repeat.ZeroOrMore.Literal(@"id=""")
    .NamedGroup("gameId", Pattern.With.Digit.Repeat.OneOrMore)
    .Literal(@"-game""")
    .NamedGroup("content", Pattern.With.Anything.Repeat.Lazy.ZeroOrMore)
    .Literal(@"<!--gameStatus")
    .WhiteSpace.Repeat.ZeroOrMore.Literal("=").WhiteSpace.Repeat.ZeroOrMore
    .NamedGroup("gameState", Pattern.With.Digit.Repeat.OneOrMore)
    .Literal("-->");

So we're replacing a nice, succinct one line regular expression with ten lines of objects, methods, and named enumerations. This is progress?

I'll grant you that I am probably unusually familiar with regular expressions, even by developer standards. There's a reason they have a reputation for being dense and inscrutable. I've definitely seen some incredibly bad regular expressions in my day. But in my professional opinion, that regex was a well written one. I had no problem reading it. Adding a ton of hyper-dense object wrappers to that regex makes it harder for me to understand what it does.

The new syntax Joshua invented is great, but it's specific to his implementation. Although it may seem like a good idea to use these kinds of training wheels to "learn" regular expressions, I'd argue that you aren't learning them at all. And that's a shame, because regular expression syntax is a mini-language of its own. Once you learn it, you can use it anywhere; it works (almost) the same in every environment.

The Subsonic project attempts to do something similar for SQL. Consider this SQL query:

SELECT * from Customers WHERE Country = "USA"
ORDER BY CompanyName

Here's how we would express that same SQL query in SubSonic's fluent interface:

CustomerCollection c = new CustomerCollection();
c.Where(Customer.Columns.Country, "USA");
c.OrderByAsc(Customer.Columns.CompanyName);
c.Load();

I've mentioned before that I'm no fan of object-oriented rendering when a simple string will suffice. That's exactly the reaction I had here; why in the world would I want to use four lines of code instead of one? This seems like a particularly egregious example. The SQL is harder to write and more difficult to understand when it's wrapped in all that proprietary SubSonic object noise. Furthermore, if you don't learn the underlying SQL-- and how databases work-- you're in serious trouble as a software developer.

But I can see the rationale behind these types of database code generation tools:

  1. They "solve" the object-relational mapping problem for you (and if you believe that, I have a bridge you might be interested in)
  2. you get intellisense
  3. your database is strongly typed
  4. the compiler now "understands" the database, or at least the generated classes that represent the database.

I definitely sympathize with the desire to produce less code, and that's the whole point of database code generation tools. Personally, I would argue that most of these benefits could be realized with smarter IDEs that actually understood native SQL strings (or regular expressions), rather than relying on a slew of generated code and complicated, proprietary object syntax.

But let's take a step back and think about what's really happening here. In both cases, we are embedding one language inside another. SQL is a language. Regular expressions are a language. Wrapping those languages inside a bunch of mega-verbose fluent interface ObjectJunk-- just so we can pretend we're writing code in our primary language-- is a total cop-out. Fluent interface object wrappers feel like a nasty hack to me.

Why can't we embrace the language-inside-a-language paradigm, rather than running and hiding from it? These domain specific languages exist because they are optimized for processing strings and data efficiently. Avoiding them is counterproductive.

Perhaps the ultimate solution is to redefine the underlying language to incorporate the features of another language.

Consider how Perl integrates the regular expression language:

while (my $line = <IN>) {
    while ( $line =~ /(Romeo|Juliet|Mercutio|Tybalt|Friar \w+)/g ) {
        my $character = $1;
        ++$counts{ $character };
        } 
    }

Here's how C# 3.0, with LINQ, integrates the SQL language:

var c = from Customer in Customers
  where Customer.Country == "USA"
  orderby Customer.CompanyName
  select Customer;

Note the conspicuous lack of ObjectJunk. No explosion at the parens and periods factory. No MassivelyLongTextEnumerations to deal with. There's nothing but code that looks like exactly what it does. And that's a beautiful thing.

Embrace the idea of languages inside languages. In The Land of Strings, we speak regular expressions. In The Land of Data, we speak SQL. Oh sure, you can pretend those languages don't exist, and hide out in the Kingdom of Nouns-- but you're only cheating yourself out of a deeper understanding of how things really work in those other places. Fluent interface object wrappers may seem like a helpful convenience, but they're actually an ugly hack, and a terrible substitute for true language integration.

Posted by Jeff Atwood    View blog reactions
« Your Desktop Is Not a Destination
The F5 Key Is Not a Build Process »
Comments

If only someone could fluently resolve the mess of PHP, CSS, HTML and Javascript (*cringe*) that form a webpage...

Graphain on October 30, 2007 4:00 PM

Great post by the way, (and apologies for the double post) but I had an observation - as terrible as object wrappers are, I can see merit in not hard-coding SQL code throughout a project, for instance isolating the code that will need altering if the database changes although I guess equally you could have "SELECT * FROM " + dbName and still avoid object wrappers.

Graphain on October 30, 2007 4:03 PM

i don't hate my job so much now.

sir jorge of culver on October 30, 2007 4:10 PM

Interesting points.

I can see one benefit such abstractions could give you though, the chance to avoid different implementations of the same language by abstracting that away to a library.

For example, we all know that SQL is slightly different on Oracle, MS SQL, MySQL etc. If someone had written the proper adapter in the random object filled language, we no longer need to worry about the lower level (or restricting ourselves to the lowest common denominator) SQL implementation after loading that adapter.

Still, I agree with you, most such levels lack elegance, and hide too much of the detail that every developer should know.

mwalts on October 30, 2007 4:12 PM

The subsonic example you gave looks just like what the 'pure' C# 3.0 looks like (ie a bunch of rather handy extension methods for IEnumerable). The SQL-like syntax is just an 'added extra' on top of this. Personally I prefer the non-SQL-like syntax, because it offers a much clearer picture of what the compiler is actually doing, but I guess that's a matter of personal taste (and is probably related to the fact that I'm more often querying in-memory collections or XML documents than I am database tables).

Incidentally, your LINQ example relies on an autogen Customer object which maps to (presumably) a Customer table in the database. The four points of "rationale behind these types of database code generation tools" absolutely apply to LINQ as well.

carlos on October 30, 2007 4:14 PM

I'm not familiar with LINQ, but can you do something like this (excuse the pseudo-code)?

var baseQuery = from Customer in Customers select id
var firstId = baseQuery orderby createdAt limit 1

I think the strongest argument for making something fully object-oriented, with a real live grown-up API is that you get to use standard OO methodology to get things done. So I'm all sorts of OK for making my language have SQL, XML, etc. embedded in it as something other than clunky APIs or landmine strings, but I'd want that to be syntactic sugar on top of a standard object-oriented library.

Now if only someone would get creative with a packrat parser and a compiler...

Coda Hale on October 30, 2007 4:17 PM

PS. totally agree about the regex example. Show that to any decent programmer that wasn't brought up on java or C# and see how they laugh. Regex only *looks* intimidating; in fact it's pretty simple, and the basics can be learnt in under an hour.

carlos on October 30, 2007 4:27 PM

Basically agree with you here, Jeff. In spite of my lesser abilities with Regex, I do see the benefits. It's on my TODO list to master them. I see it as a sort of math for strings:

a = (b + 1) * a ^ 2

is a lot better than (in fake OO code):

a.valueOf(b.plus(1).times(a.pow(2))

There seems to be a tension between expressiveness and fluency.

I like:

a = [:] (Groovy - I'm sure other dynamic languages have something similar)

a lot better than:

import java.util.HashMap;
a = new HashMap();

Since groovy adds a whole bunch of symbolic operators to handle collections it actually makes the code easier to write *and* easier to read <i>once you know the inner language</i>.

On the flip side, it would be easy to get carried away with the stuff. Operator overloading has a bad reputation for just this reason. A little goes a long way.

Regex, SQL, and within Groovy, the collection operators, are universal enough that learning a domain specific "inner" language is certainly worth the trouble.

Regards,

Matt

Matt Lentzner on October 30, 2007 4:30 PM

I think your RegEx example is valid because regex is very similar across all platforms. The problem with the database example is that databases are not similar across all platforms. Even that extremely simple SQL sample you wrote won't run in SQL Server. If you want to support multiple db's then somewhere that SQL code has to be abstracted away. You could put it in stored procs and that works for many scenarios, but even stored procs are vastly different from one DB to another. And stored procs are a pain when trying to create adhoc queries which is where the subsonic or LINQ can be useful.

Craig on October 30, 2007 4:39 PM

I hadn't looked at it that way. I thought the effective use of fluent interfaces was to get closer to a domain specific language, rather than just to rewrite regex in objectese. So rather than call 8 separate variable tweaks on an object to perform an action, I can chain them into code that expresses what a customer might think.

Programmer steve = new Programmer().hasName("Steve")
.hasOOPExperience()
.doesntget("FluentInterfaces")
.dockPay(200%);

Steve Jackson on October 30, 2007 4:44 PM

Jeff - SQL is a string, not a line of code and just writing it doesn't pull it from the DB into your app. The lines of code you omitted do that :p. But you raise a good point... anything can be abused yes?

For what it's worth, you can use a Query (1 line) to do this:

IDataReader rdr=new Query("Customer").WHERE("Customer.Columns.Country, "USA").OrderByAsc(Customer.Columns.CompanyName).ExecuteReader();

Rob Conery on October 30, 2007 4:45 PM

Coda Hale:

Absolutely, yes. The result of a query is itself queryable, so queries may be chained in the manner you suggest. In fact, behind the scenes that is exactly what the compiler does. So you could say something like:

var ids = from c in customers select id;
var idssorted = from id in ids orderby id select id;
var idfirst = idssorted.First();

(You can think of queries as monads, if that helps.)

Eric Lippert on October 30, 2007 4:46 PM

To the guys talking about abstracting away the underlying database:
I understand that it seems like a solid goal, to abstract away the DB layer through a middle-layer... but think about this:
How often does your app change underlying database?
When it does change, doesn't a giant portion of the code end up having to be rewritten anyway?
How much time is spent abstracting things to the point where you see no SQL, would that time be better spent elsewhere?
What kindof performance hits do you take by not utilizing performance enhancing / time saving features of your DB are you avoiding in the name of abstraction?

I think there's a point of diminishing returns in abstracting everything to that level... By the time we need to change the underlying DB, I can get actual time allocated to do the conversion, whereas the initial creation of the project seems to have a more strict deadline outside of my control... Also, if you're changing your underlying database system, that usually indicates that there's more wrong with your system and it's about time for a app-wide refactoring and/or rewrite.

Now maybe if you're only using the database for persistence, instead of treating the data as the top priority, abstracting the db layer extensively may make more sense.

Just some observations,
Cheers!

intangible on October 30, 2007 4:46 PM

That code is in no way object oriented. Simply putting methods inside a namespace and calling it an object is not object oriented programming.

Check out Seaside for an object oriented way to generate HTML. Now THAT is object oriented.

Phil on October 30, 2007 4:49 PM

Jeff: Thanks for the shout-out. We are very proud of LINQ.

Note however that I don't think of LINQ as "embedding SQL in C#". Rather, I think of LINQ as "providing abstractions for the ideas of sorting, filtering, projecting and grouping which work across arbitrary data".

That those operations are particularly useful for SQL Server databases, and that we have a particularly clever transformation from LINQ expressions to SQL is of course delightful, but that's only a part of the power of these abstractions. We want to be able to apply these abstractions to ALL data, whether its stored in XML, a database, arrays in memory, arbitrary object graphs, web services, whatever.

Eric Lippert on October 30, 2007 4:54 PM

While I appreciate the sentiment that "good developers should just buckle down and learn regex's" the fact is that while they are a language, there is really nothing in them that jogs a coder's memory as to how they work. I can't tell you how many times I've had to relearn regex's because their syntax is completely unhelpful to someone who is not using them on a daily basis. There have been times when I was more or less fluent and after long periods of not using them, even things I'd written were incomprehensible garbage to me. I agree the object oriented approach above is really clunky, but essentially someone who has never seen regex's can pretty much understand what is going on.

Your point seems to be that these workarounds make the code less succinct, which is true, but the reason they exist is to improve clarity and maintainability, not brevity.

Joshua Kuhn on October 30, 2007 4:57 PM

@intangible - well I've rarely switched a database midstream, but when evaluating a web-based product to install on my servers, I do have to make a choice of a database up front.

Open source projects such as blog engines deal with this a lot. You might want to run my blog engine on MySql while I run it on SQL Server. If I embed SQL all over my app, then I'm stuck.

Thus, I need to abstract away the database. LINQ is actually an abstraction that looks closer to SQL, but isn't SQL.

Haacked on October 30, 2007 5:00 PM

perl supports the /x flag on in-line regular expressions, letting you put extra whitespace and comments in the regular expression for readability.

Languages that don't have native regular expression support, like PHP, where regular expression literals are really run-time parsed string literals, are a bear to work with. You have to contend with the double meaning of backslash when using pcre. There's a lot of odd docs/notes/hints on the PHP website about using double-backslashes in your pcre string literals when quoting your strings with double quotes (however, it fails to mention that this ceases to be a problem if you single quote enclose string literals that contain regular expressions).

Javascript is interesting, in that it doesn't have the same kind of variable interpolation that perl does, so you can't interpolate variables into regular expression literals (which evaluate as regular expression objects), but you can pass a string (perhaps as the result of a series of concatenations) to a regular expression object constructor, and get the same kind of regular expression object back.

Axel on October 30, 2007 5:10 PM

Hey Now Jeff,
I liked this post. When reading the beginning I was thinking how classic asp would compare to LINQ with VB.NET or C#. I really liked that example when I read it (the one with Perl).
Coding Horror fan,
Catto

Catto on October 30, 2007 5:15 PM

This really should be 2 posts.

Some languages are never pretty and difficult to write so that they are relatively easy to read. On the other hand, some people go out of their way to make a readable language unreadable. Regex expressions would fall into the first category.

LINQ will find its way into mainstream, unfortunately. For the sake of "object"-tiveness and Intellisense if nothing else. Once again we add a layer of abstraction (and ultimately less efficient SQL), and make our apps a little more brittle.

Your C# 3.0 example is mindlessly simple, and not a fair representation of what LINQ can or cannot do.

Steve on October 30, 2007 5:35 PM

Your Subsonic example hits close to home because I've been using another .NET ORM that's about 3 times as verbose. We were actually thinking about switching to Subsonic because it was LESS wordy. These days I feel like dumping the ORM altogether.

But what about situations where a DSL can be much less repetitive? For the basic CRUD stuff, the same ORM I hate for being so complex can also be very concise in situations where I'm saving data for just one table. Do think there's a place for abstracting away the underlying language if it's much less work?

For example, the form helpers in Rails write a lot of the repetitive HTML code for you. If you already know HTML and you're not doing anything out of the ordinary is there an advantage to writing it by hand when you have no reason to?

Matt Ephraim on October 30, 2007 5:38 PM

@Haacked,
I'm not sure that LINQ is a good example of "abstracting away the database" from a db platform perspective, since LINQ to SQL is entirely SQL Server specific (my biggest disappointment with LINQ).


Kevin Dente on October 30, 2007 5:48 PM

I agree pretty much with you on the RegEx example, probably because regexes are usually very brief and terse, and that Joshua example is... quite an explosion of characters.

However, while the SubSonic example might not be super pretty, I do think it's a very good idea to abstract the database layer somewhat - no pure SQL strings littered in the code, please. Being able to switch database provider can be valuable especially during the early phases of development, but it also makes it easier to use features like placeholders (who likes SQL injections?) and, say, memcached.

Imho any coder littering their code with raw SQL queries, especially in php/asp, should be shot on sight.

f0dder on October 30, 2007 5:49 PM

> This really should be 2 posts.

I thought about that, but SQL and regex have more in common than most people realize. They truly are miniature domain specific languages-- and they're both absolutely core to any modern development software toolkit. Furthermore, they do vary quite a bit in syntax per environment/platform, so you could make the counter case for abstraction as well.

> You might want to run my blog engine on MySql while I run it on SQL Server. If I embed SQL all over my app, then I'm stuck.

I'd argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway. Of course you'd have some kind of data layer; I'd just choose an extremely minimalistic one.

http://troels.arvin.dk/db/rdbms/

Jeff Atwood on October 30, 2007 5:55 PM

> a = (b + 1) * a ^ 2
> is a lot better than
> a.valueOf(b.plus(1).times(a.pow(2))

Totally agree! It's like we have this hammer called "objects" and we can use it to solve every problem, readability and comprehension be damned! The XML integration in VB is a good example too.

http://blogs.msdn.com/xmlteam/archive/2006/02/21/536197.aspx

> I think of LINQ as "providing abstractions for the ideas of sorting, filtering, projecting and grouping which work across arbitrary data".

True-- but bringing the database to the language is a reasonable way to explain that, too.

> IDataReader rdr=new Query("Customer").WHERE("Customer.Columns.Country, "USA").OrderByAsc(Customer.Columns.CompanyName).ExecuteReader();

IDataReader rdr = QueryDb("SELECT * FROM Customer WHERE Country = 'USA' ORDER BY CompanyName");

Which is more likely-- developers that understand Subsonic, or developers that understand SQL? Which is more desirable?

I will agree it's a total travesty that we don't have IDEs that can understand SQL strings and provide automatic intellisense and parsing of the SQL statements in SQL strings.

Jeff Atwood on October 30, 2007 6:02 PM

intangible,

To answer your questions

1. How often does your app change underlying database?
For many projects, never. For many projects, often. As someone else pointed out, if you are releasing something for more mass market distribution on their own servers, they often want to choose between SQL Server, Oracle, MySQL...

2. When it does change, doesn't a giant portion of the code end up having to be rewritten anyway?
Not if done correctly. A small portion may need to be rewritten if that portion had to be optimised for a particular DB platform.

3. How much time is spent abstracting things to the point where you see no SQL, would that time be better spent elsewhere?
No time is spent. If you begin the project with the particular framework and methodology then no time is lost.

4. What kindof performance hits do you take by not utilizing performance enhancing / time saving features of your DB are you avoiding in the name of abstraction?
It depends. In most cases, no performance hit that is noticeable by the user. Of course if you have a function 'update all stock prices on 200,000 items by 10%' then you may want to put that in a stored proc to speed things up. The framework should allow for that so that it is not painful to change later.

I think there is a clash of minds sometimes between 'code' guys and 'DB' guys when developing applications. DB guys (and girls) think the database is the center of the universe and to take anything out of the DB would be sacrilege. Pure code guys think the DB is just a data store and not much more. Good develops hover somewhere between.


Craig on October 30, 2007 6:09 PM

To answer some points raised by an earlier poster:

> How often does your app change underlying database?

Shrink-wrap webapps have to target ALL databases.

> When it does change, doesn't a giant portion of
> the code end up having to be rewritten anyway?

No. A decent abstraction layer can avoid this entirely.

> What kindof performance hits do you take by not
> utilizing performance enhancing / time saving
> features of your DB are you avoiding in the
> name of abstraction?

An abstraction layer can take advantage of DB specific features.

I use Hibernate (Java ORM toolkit) and its "dialect" classes make the creation of shrink-wrap applications that target multiple databases a doddle.

The question of whether to abstract away the database and of how query the database are orthogonal.

Hibernate provides a set of OO classes for accessing the data, which nobody uses. It also provides something called "HQL" which is a SQL-like set-oriented language for querying the object relationships persisted into the database, and everybody uses this even though it's got some rough edges. Jeff is right - sometimes you really do need a language that matches the problem domain.

Dave on October 30, 2007 6:13 PM

Jeff,

"I'd argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway."

Seriously Jeff, this is an amateurs mistake.

Craig on October 30, 2007 6:14 PM

"I would argue that most of these benefits could be realized with smarter IDEs that actually understood native SQL strings (or regular expressions), rather than relying on a slew of generated code and special purpose object syntax..."

I don't know about regexp's but I do know from poking around in lexx and yacc that implementing a parser for SQL is not such a trivial task as it might look (for such a "simple" language the SQL spec is huge). And there are so many DBMS specific "features" to consider.

David on October 30, 2007 6:15 PM

This is a subject I feel strongly about. Let languages specialise and merge elegantly so that we can switch paradigm accordingly using a clean syntax.

Fluent languages are like going back doing math the way Bertrand Russel and friends did it before he came up with the current mathematical notation.

Sometimes I wish we could take the next step and go beyond the 7-bits ASCII notation when illustrating code. I know, I know, this means a whole new bunch of keyboard escape sequence to remember, but on the other hand, at least, it might sufficiently annoy enough anglo-saxons with UTF-8 issues they might finally decide in unison to support that charset universally in all of their applications, much to the benefit of all cultures who make a normal use of all sorts of strange little characters.

...but I disgress.

Luis' Parenthesis on October 30, 2007 6:25 PM

The problem with SQL is that its abstraction capabilities are pretty crappy, and the code is accordingly verbose, hard to understand, bug prone and tightly coupled. ORM's (atleast the good ones) present you with a way to abstract common relational operations into simpler easily composable terms.

Heres an example in SQLAlchemy, a Python ORM, although this probably works the similarly in other ORMs. (sorry for the long post, but I think the illustrative examples are worth it)

Simple selecting is not all that much different:

>>> print Customer.query.filter_by(country="USA").order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s ORDER BY customer.company_name

But lets say we need only the rich customers that have more than 1M total cash on their accounts:
>>> print Customer.query.filter_by(country="USA").filter(select([func.sum(Account.balance)]).where(Account.customer_id == Customer.id).as_scalar() > 10**6).order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) > %(literal)s ORDER BY customer.company_name

This is pretty much the same on both sides, with ORM code being terser but not exactly less complex or that much shorter. But important part is, we probably need to know the total worth of our customers in many places, so we want to abstract that part out. To do this we add to our mapper the following property declaration:

total_cash=column_property(select([func.sum(account_table.c.balance)],
account_table.c.customer_id == customer_table.c.id).label('total_cash'), deferred=True)

Now that query becomes:
>>> print Customer.query.filter_by(country="USA").filter(Customer.total_cash > 10**6).order_by(Customer.company_name)
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) > %(literal)s ORDER BY customer.company_name

Now I'd say the ORM has a clear advantage in clarity, if this is used in many places code duplication is lessened and the definition of total cash is in one single place so we can update the definition of that with little effort.

This also helps the separation of concerns. For example the predicates that a resultset has to match are the concern of the business logic side of things, while additional entities that need to be joined are determined by how we are going to display that resultset.

For example, the presentation layer might know that it is going to also display all the account details of the returned customers so we can add join for that independently of the rest of the query:
>>> print Customer.query.filter_by(country="USA").filter(Customer.total_cash > 10**6).order_by(Customer.company_name).options(eagerload('accounts'))
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name, account_1.id AS account_1_id, account_1.customer_id AS account_1_customer_id, account_1.name AS account_1_name, account_1.balance AS account_1_balance
FROM customer LEFT OUTER JOIN account AS account_1 ON customer.id = account_1.customer_id
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) > %(literal)s ORDER BY customer.company_name, account_1.id

The presentation also might deal with paging the results and know that we only need the first 10 customers, so add that too:
>>> print Customer.query.filter_by(country="USA").filter(Customer.total_cash > 10**6).order_by(Customer.company_name).options(eagerload('accounts'))[0:10]
SELECT customer.id AS customer_id, customer.name AS customer_name, customer.country AS customer_country, customer.company_name AS customer_company_name, account_1.id AS account_1_id, account_1.customer_id AS account_1_customer_id, account_1.name AS account_1_name, account_1.balance AS account_1_balance
FROM (SELECT customer.id AS customer_id, customer.company_name AS customer_company_name
FROM customer
WHERE customer.country = %(customer_country)s AND (SELECT sum(account.balance)
FROM account
WHERE account.customer_id = customer.id) > %(literal)s ORDER BY customer.company_name
LIMIT 10 OFFSET 0) AS tbl_row_count, customer LEFT OUTER JOIN account AS account_1 ON customer.id = account_1.customer_id
WHERE customer.id = tbl_row_count.customer_id ORDER BY tbl_row_count.customer_company_name, account_1.id

Now that SQL could be cut down a bit by using select-star and formatted better, but it definitely isn't easier to write, understand or maintain than the ORM query. And that isn't even a particularly large query.

Ants Aasma on October 30, 2007 6:42 PM

God I hope there are Java programmers reading this, they believe everything (and I do mean everything) to be done though API's. With properties and events, LINQ, XML literals etc. in competing languages it's getting really depressing to work in a pure Java shop still stuck with technology from the 90's.

Casper Bang on October 30, 2007 6:42 PM

Man, I'd give a lot to have Regex literals in C#. Regex literals are easily my favorite part of Perl or Ruby, and I cringe every time I have to go through the whole "new Regex" dance.

Ben Hollis on October 30, 2007 6:48 PM

I believe that object representations of queries are extremely helpful when you're trying to create a user interface that provides the language model, for example the query expression, being able to construct the query in the native language is very nice.

Noah Campbell on October 30, 2007 6:50 PM

Jeff - how would you write this example (from your comment), taking in an argument from a variable to avoid injection:

"IDataReader rdr = QueryDb("SELECT * FROM Customer WHERE Country = 'USA' ORDER BY CompanyName");"

I know the point your getting at, but it's holding you over a cliff. I think you didn't dive deep enough into this subject and see there are reasons to put the shotgun in the gun closet my friend.

Which is more probable? A developer that learns SubSonic, or a developer who writes inline SQL ripe for injection? Which is more desirable?

Rob Conery on October 30, 2007 7:12 PM

I believe your topic relates to the same example MS and other IDE vendors have been throwing at us for years. "Using Datagrids (datasets, datawindows, flexgrid, ultragid, whatever) you will turn all of your sql over to the control to take care of for you!"

And that always works perfectly in their examples...that deal with 1 table, with no joins, etc. Throw in even a single join, and that automatic SQL breaks down.

Not to mention if you want to give a user the ability to alter multiple rows in a table (say, set the price column of all 10 penny nails to .07 instead of .08), then the paradigm completely breaks down.

I like SQL. I like C#. I prefer to have 100% control over my sql...how it selects/updates/deletes/inserts...I still believe that I know how to tune a query much better than any annoying grid.

Oh, and if the DB changes underneath, then the grid completely breaks down -- some lose columns and you have to recreate them (including all formatting, many other properties), or they just start giving compile errors. If I change a column name, I can do a very fast search and replace (helps if the IDE supports REAL regular expressions).

Because I know SQL so well, I can determine very quickly with 100% accuracy how to find all my statements that refer to those columns.

Ugh. Make the compiler better and faster...make my IDE AWARE of my SQL structure, but not try to generate it for me in some black box that I can't get to. Give me 100% compatibility between code I wrote 5 years ago and today.

Don't spend time trying to save me time that just ends up getting in my way.

Matt on October 30, 2007 7:28 PM

This is another case for using the best tool for the job. SQL Queries, for instance, should probably be put behind a layer of abstraction if not for the simple fact that it helps prevent programmer error. Anyone who feels they're confident enough to embed SQL where ever they feel like hasn't been burned by DELETE * yet.

Bill on October 30, 2007 7:59 PM

I agree that attempting to abstract another language via an API generally works out badly, whether the API in question is fluent or not, and the main thrust of your argument is sound. Imagine writing VB.Net through the CodeDom! Etc...

However your opening preamble gives the impression that you are using arguing against fluent interfaces *as a concept*, in which case you've used a straw man. Two bad examples doesn't prove the case (though perhaps the absence of good ones does).

piers7 on October 30, 2007 8:16 PM

So… every language should implement every other language.

Sounds like the problem is recursive.

I love RegEx support in languages but putting SQL intellisense into a language only solves an extremely small part of the problem.

Relations, Populating your complex Business Objects, Lazy Loading as need, SQL Optimization, Database Abstraction, Strong Typing, Type Mapping, etc. <-- I'm only brushing the surface.

Do you really want to go back to being a full time plumber? I don't, I prefer to outsource anything that requires a whole lot of elbow grease near the porcelain.


Martin Murphy on October 30, 2007 8:45 PM

Jeff, I disagree with this. You say that the solution is to make domain specific languages part of the language we're working in. For about 99% of us adding arbitrary powers to the compiler is just not in the cards. Why be afraid of abstraction? Isn't that our job as developers?

Sean Scally on October 30, 2007 9:16 PM

Now if only MS would open up CSC so we could actually do this well..

Until then, an OO abstraction is the best we can do without serious effort (ie..ANTLR)

Evan on October 30, 2007 9:24 PM

Fluent interfaces arguably have a place but I don't think that place is necessarily for replacing database queries, nor regular expressions.

I think fluent interfaces do have a role to play in domain-driven design (see books by Evans and Nilsson) and the like. They talk about the need to derive a "ubiquitous language". This language forms the model that is used:

1. in analysis to aid in understanding by all parties in a development project
2. for unambiguous communication between developers and users/BAs
3. as the actual model that the code is built with

Now that's all good in theory but the problem is that you often can't express all the nuances of natural language required for the model to be used in point 2 (communication), in something like UML or object /state/etc diagrams, for use in point 3 (code). As a result you often lose model specifics when you try and code.

So to derive such a model and then still have it be expressive and useful in and for coding requires a coding language construct that is, likewise, expressive. This is where this whole idea of fluent interfaces comes from, or so I gather - I'm not saying I totally buy into it though.

So for example, when a (business logic) model says "the software should monitor a specific folder and all its sub-folders, and send an email to Mark if a new file is created" you could write:

string mark = "mark@email.com";
Notifier notifer = (new Notifier()).Monitors(specifiedFolder).IncludingSubFolders.On.NewFileCreation.Then.Email(mark);
notifier.Activate();

Two main prolems though:
1. fluent interfaces don't conform to common coding conventions at all (eg properties doing work and not just returning values, or often returning "this" instead of nulls) so are jarring to read for some developers
2. they aren't always easier to read anyway (see your regex example), particular if the fluent interfaces aren't "phrased" as you personally would say them.

Clearly fluent syntax is more concise and readable for simple cases but (in the .Net sphere) with longer initialisation lists, I think the .Net 3.0 object initialisation stuff is arguably better. You also often have to go to a lot of effort with interfaces to make the "sub parts" of the fluent interface hold together.

Cheers,
Mark

Mark Langsworth on October 30, 2007 9:36 PM

I think LINQ is amazing. I particularly like the abstraction of "data". It doesn't matter if my data is rooted in an instance of Collection or in a DB table. It's still just data, and I can query it efficiently with an SQL-like language.

I'm curious, what do you think about the System.Linq.Expressions? Give the post, I'd suspect that you wouldn't like things like this:

var localCustomerNames = customersList.Where( c => c.City == "Montreal" ).Select( c => c.Name );

Which is equivalent to

var localCustomerNames = from c in customersList where c.City == "Montreal" select c.Name;

I really like the "Expressions" OO-like syntax. Maybe even *because* of its fluency. I like that "from" doesn't need to be explicitly stated if I'm using someCollection.Where(...). They're also nearly identical to the Array#select and Array#collect methods from Ruby, which I've grown pretty comfortable with.

Skrud on October 30, 2007 9:38 PM

IMO, that's not languages *inside* languages, but foreign languages. Languages are born to equal.
SQL, RegEx, math operators, HTML, CSS, js, etc, are all foreign languages to Java or C# or whatever.
as well as JVM speaks java, DB speaks SQL, Strings speaks regex, numbers speaks math operators, IE/FF speaks HTML&js&CSS.
JVM, DB, IE are different countries speak different languages.

anyone got what I said?

jackhatedance on October 30, 2007 9:41 PM

While in general I agree with you, Jeff, you have to remember that it's very unlikely that you actually have a line saying "SELECT * from Customers WHERE Country = "USA" ORDER BY CompanyName" in your program. It's more likely to look something like this:

String sqlQuery = "SELECT " + selection + " FROM " + sourceString + " WHERE " + condition01;
if (condition02 != null) then sqlQuery += " AND " + condition02...

and so on. Your string only looks this obvious and understandable because it's the RESULT of what are likely going to be plenty of lines of code tinkering it together.

J. Stoever on October 30, 2007 9:43 PM

Actually, writing code that reads like English isn't that difficult:

if (input.is_valid_email() && user.is_admin()) { ... }

It's simply a matter of choosing one's wording carefully. And as Steve Jackson pointed out, chaining method calls is a good way to create a minilanguage when there's no alternative. Look at jQuery. But again, this must be well thought-out.

Regarding punctuation: if you want it to *really* look like English, use Ruby. But punctuation is useful: it makes the various parts of an expression stand out. Why do you think the C syntax has been adopted by most programming languages invented after it? Familiarity is just part of the answer.

Regarding the issue of SQL abstraction, I know a better way: just put all your queries in a single module or package. When the time comes to switch databases, all the required changes (many as they might be) are localized. And then it's *your* abstraction layer, lean, mean and under your control. Yeah, I know, I've been burned by PEAR DB once too many times...

...And no, a webpage doesn't have to be a mess of PHP, CSS, HTML and Javascript. Not at all.

Felix Pleşoianu on October 30, 2007 9:43 PM

sorry but I have to say it's the best post for weeks.

jackhatedance on October 30, 2007 9:45 PM

I would have read the whole article, except the straw man BS at the beginning somehow claiming that fluent APIs are considered a valid replacement for regular expressions was just too much of a put-off.

DSLs, ubiquitous language, SQL, regex, mathematical notation, are all useful tools. None of them replaces the other. Sheesh. Hopefully you finished the article up with a "Ha, ha, I was just joking about that dumb regex claim in order to make a point". I'm not going to bother to look with this one though.

Marcus on October 30, 2007 9:49 PM

Good post, though I agree with Rob Conery's comments. SubSonic pre-dates LINQ and utilizes C# as it existed at the time. It's convenient now with C# 3.0 and LINQ to poke at the SubSonic query syntax especially using an example with four lines of code (when one could have sufficed) though wouldn't have helped your argument.

SubSonic aside I'll agree a good understanding of things like Regex and SQL is first and foremost allowing one to better/more fully understand the pitfalls of ORM frameworks.

I'd say take Rob up and do the deep dive here and see what turns up! I'll be watching for that post. :)

Steve Trefethen on October 30, 2007 10:07 PM

The problem with having languages embedded in your languages is that you need to train people to work in both. You're just extending your feature set that much further. This is one of the problems with C#: it does SO MUCH with its libraries, you need to spend forever learning it. Then, because you have spent so much time learning it, you are tied to it forever.

A language undeniably better and more usable could come around, and the cost of the time you have sunk into learning one language becomes so great that learning another is unthinkable.

Bad idea.

SQL and Regular Expressions are in good spots for sub-languages. They both serve very important and broad-based needs.... But if there were, say, twenty things like them (XML aims to be a third, Javascript [in some ways] another, TCP/IP Networking could be another, as could Cisco's OS, etc etc etc.....) that were all completely necessary, you would be unable to become skilled at actual production of code.

Dylan Brams on October 30, 2007 10:15 PM

There seems to be a substantial amount of confusion around SQL. All of my web-apps have used "SQL" and I develop on MySQL, and many of my clients run PostgresSQL, and even MsSQL yet I have never had a complaint of the app not working. If you don't use DBMS specific code then your SQL queries will run on most any SQL based server.

Arron on October 30, 2007 10:16 PM

I'm so easily annoyed by small details. In:

a.valueOf(b.plus(1).times(a.pow(2))

you're missing a ")" at the end. However this error only proves your point.

Andrew on October 30, 2007 10:30 PM

Ahhh. Remember Cobol:

Subtract X From Y Giving Z

So beautifully verbose, one line would take 10. Easy to read if you enjoy a 5,000 page novel.

So poorly optimized. It took forever to run.


Then there was APL

Z<-Y-X

5 characters (That's a left arrow). X, Y and Z can be any object type, including vectors, arrays and multi-dimensional objects. With it's powerful yet succinct operations (e.g. Y+.*X is sumproduct) allowing those who have nothing better to do to write "any" program in one line (with a few tricks), albeit a very long line.

So beautifully condensed, getting that 5,000 page Cobol novel onto one page. Yet impossible to read.

Could only be interpreted and not compiled. It took forever to run.

Louis Kessler on October 30, 2007 10:31 PM

Come on, tell me I'm not the only one who read that as "flatuant interfaces"? :)

Simon Roberts on October 30, 2007 10:39 PM

(I obviously can't type either): flatulent

Simon Roberts on October 30, 2007 10:40 PM

Neat, I always like when I disagree with your blog posts, mostly because it happens so rarely and it reminds me that you really are just some guy with opinions :) Usually good opinions but just opinions none the less.

I feel you missed a big point of Joshua's blog post, that he is attempting to make code more readable and maintainable. Given the size and complexity of projects I'd rather see *more* lines of code if they make it easier to understand what's going on. If I need to tell if I regex is bad his method is more discoverable while a pure regex sucks horribly if you're not a regex guru. Now I happen to think simple regex aren't that hard to read but I get his point and I agree with it, same as I agree with SubSonic's Query object or the fluent testing API's that are out there.

I will take maintainable and readable code that is expressing intent any day over saving a few lines of code. I will also take a discoverable interface over one that needs a seperate help document. There is nothing to jog my memory in the IDE when writing a regex expression, I have to go hunting in the tubes for the right syntax yet with Joshua's interface I can quickly discover/remember my available options. No, it's not perfect and I may or may not use it but his intent is spot on and I agree with anything the promotes it.

Shawn Oster on October 30, 2007 10:54 PM

Jeff, come on! Are you jumping the shark as well?

First a disclaimer - I DO know SQL and regular expressions. I've used them a lot and I hate them from the bottom of my heart. I have cried almost on every encounter with them. End of disclaimer.

Fluent interfaces are pretty nifty IMHO. I wish there was no other way to program. The code gets a lot more readable and even sounds like English. Heck it even looks like SmallTalk :)

Regular expressions are a big pile of mud when it comes to readability. No doubt they are your first choice for text processing but still their syntax sucks. Please don't give regular expressions as a good example of integrating domain specific languages.

I don't know why you prefer LINQ over SubSonic's query language. LINQ is yet another abstraction on top of SQL. It just uses some syntax sugar to make it look like SQL (admit it, the syntax highlighting is the thing which turns you on). However LINQ only resembles SQL - for example the whole statement is spelled backwards "from abc select cde" where in SQL it is "select cde from abc". LINQ however is helluva better than plain concatenated SQL strings hammered against the poor database.

Attacking Rob Conery (and SubSonic) was not a good move at all. Let alone politically correct. I've been aggregating his blog for a long time and have never seen a bad word in your direction. Why did you attack him, Jeff? <joke>Are you being jealous that Microsoft will start paying Rob to work on his open source project?</joke>

Your blog is really jumping the shark IMHO. Please make something to prevent this. One more "3 monitor rox0rz" post and I will ditch it for good. And I don't want to do that - I've been a long time fan.

Anonymous Nagger on October 30, 2007 11:47 PM

LINQ? not sure I am a fan of the idea of it. I started with old asp and embedded SQL into it; i found out about the pitfalls of this then I moved to .Net. NTier development followed shortly after.

When SQL is embedded into an app (be it at the business object layer or higher) you couple the tiers too tightly, what if someone changes the database schema? is it feasible to expect a recompile of the application? if you were relying on the stored procs then you just release a new stored proc and the app still works.

Mauro on October 31, 2007 12:11 AM

Yep, that's true. I believe that metaprogramming and DSL will make the future of the programming languages, since these technologies allow to integrate the SQL/Regex in the natural way

AlexK on October 31, 2007 12:21 AM

First of all, if you want to make an apples to apples comparision and you're looking for terse code, you shouldn't use the fluent interface:
IDataReader rdr=new Query("Customer").WHERE("Country", "USA").OrderByAsc("CompanyName").ExecuteReader();

As Rob pointed out (and we talked about on Saturday), there's a big difference between your simple SQL statement and what SubSonic's doing. A big difference is database independence.

The point of multiple database support isn't that you'll be moving a single application between databases, but that once you get good at SubSonic, you can easily write robust data access code that works on SQL Server 2005, SQL Server 2000, Oracle, MySQL, SQLite, etc. Today on a biz-dev call, the client mentioned MySQL support and I didn't flinch, because even though it's been a few years since I wrote a SQL statement in MySQL, I know I can do anything I'd need with MySQL right now without worrying about how it handles paging, sorting, datatypes, etc. SubSonic's MySQL provider was written by a member of the MySQL team and has been tested by thousands of users, while you're writing new MySQL queries from scratch.

That's a selling point for Linq, too - instead of using a different syntax (SQL or object) to query everything, we use standard Linq syntax which works everywhere.

A bigger issue is the use of untyped datareaders. I never use an untyped datareader in SubSonic. I use a strongly typed collection:
CustomerCollection col = CustomerCollection("Country", "USA").OrderByAsc("CompanyName").Load();
Now I can use a strongly typed collection of Customer objects, while you've got a dumb, anonymous blob of data.

I see your point in a "blue sky / wouldn't it be nice" academic argument, but it's not useful beyond that. You can't really be suggesting that we embed SQL statements in our code (SQL injection alone is a good enough counterargument there). It's just not a systainable way to write real world applications. I've got plenty of battle scars from poorly written data access code built on embedded SQL.

So, then, we need to write some sort of data access utility. Once we're doing that, does it make sense for every developer in the world to write their own, untested data access code - especially when you've established that a lot of developers have trouble with FizzBuzz? I'd prefer to work with a data access system written by some of the best developers I know.

And, really, recommending embedded SQL (in if it's just to make an academic point) without some heavy disclaimers is kind of irresponsible. How many developers do you estimate will take this as best practice advice and move to (or stick with) embedded SQL? If it's one, it's too many, and my guess is that it's a lot more than that.

Jon Galloway on October 31, 2007 1:27 AM

Totally agreed.

and LINQ sure is beautiful!

chakrit on October 31, 2007 1:43 AM

Oh, and the Linq thing. It's true that it looks prettier, but it's only because of compiler magic. I could make SubSonic queries look really pretty if I owned the compiler, too.

Under the hood, you're working with object which look a lot like SubSonic query objects:
IEnumerable<CustomerTuple> locals =
customers.Where(c => c.ZipCode == 91822)
.Select(c => new CustomerTuple(c.Name, c.Address));

The compiler and IDE allow you to express that as:
var locals = (from c in customers
where c.ZipCode == 91822
select new { c.Name, c.Address})

However, it's important to not that you're still calling through objects and fluent interfaces, it's just hidden behind the scenes.

http://msdn.microsoft.com/msdnmag/issues/07/06/CSharp30/

Jon Galloway on October 31, 2007 1:44 AM

Question: do you actually use any kind of ORM, at all?

Mike on October 31, 2007 1:49 AM

@Jeff - I think you miss the distinction between one use of a fluent language that doesn't add any value beyond the nebulous distinction of being more fluent versus the SubSonic case where there is a lot more value being added beyond just a fluent interface. If SubSonic was just about putting a fluent interface on SQL then you might have a point, but it does not, which even you acknowledge by your reference of a need to create a DAL.

The argument for fully understanding SQL is about as valid as claiming that every C# programmer should fully understand Machine Language, Assembly Language and MSIL since C# is really just an abstraction of those lower level concepts. That is the point of abstraction, so you don't have to fully understand the lower level. I think it is pure folly to think that I will ever be able to craft optimized SQL queries even close to the same level as a DBA who lives and breathes SQL. We are already putting too many languages on the individual programmer. A typical Asp.Net developer is forced to deal with VB/C# (sometimes both), HTML, CSS, JavaScript, SQL and possibly even some XSLT. At some point you are forced to either 1) hire a bunch of specialists who understand how to write efficient code for each of the "DSL"s, or 2) abstract out all of the lower level languages to make it possible for most programmers, who are generalists, to still develop apps. I would argue that few development teams have the luxury of a bunch of specialists at their disposal and thus there is absolutely a need for higher level abstractions - even leaky ones - over some of these DSLs.

Joe Brinkman on October 31, 2007 1:51 AM

>>> "I'd argue that the kind of simple SQL you need for a blog engine is all SQL-92 compliant anyway. Of course you'd have some kind of data layer; I'd just choose an extremely minimalistic one."

Well, good luck getting paged results from SQL Server 2000 then. LINQ to SQL does handle this, and the syntax is wonderful as well (something like):

skip(100)
take(10)

Mike on October 31, 2007 2:04 AM

Why/how LINQ is better/worse/different to list comprehension as seen in Haskell and Python? The trivial examples I've seen don't demonstrate a great difference.

Jack on October 31, 2007 2:43 AM

Very good subject, this is why I always read you Blog. You always seem to discuss things that matter to me.

I am one of the "bad guys" who made an abstraction for SQL. I pretty much did a Linq-alike implementation.

My Data Access Layer need to work with different solutions, both web and windows applications. And it should be able to work with several different database instances.

It should be possible to support other databases like Oracle, MySQL, etc.

All database tables are represented as classes, so that I can refactor my code and see if the code matches the database it has to work with. If I change the table structure, the compiler finds all places where it is touched.

I wanted smooth handling of null values, it should not be possible to make sql injection vulnerabilities, no type conflict between code and database. So I can do something like ..

query.Where(table.Created < DateTime.Now & table.ProductID == Guid.NewGuid() & table.Type == ProductTypesEnum.UsedItem)

It should of cause support, joins, aggregates, subqueries, etc.. without worrying about parentheses, etc. Like query.Where(table.ID.In(subQuery));

All queries should be possible to be executed directly by the sql being parsed or as automatically generated stored procedures.

All is generated in object containers of the results, like query.Construct(typeof(ProductItem))

And so much more..

I know this is far from the perfect solution, for some it would probably be a disaster but for me it has worked great. Our product (and database structure) has and is changing all the time. I am not sure I would be able to keep an overview with both large code and huge amount of STPs through all those scenarios I have worked though. I very strongly believe in DRY and of cause you cant ignore SQL and TDD can save you for so many things.

But for me it was not about doing a fancy tool, it was to implement a rule-set that would force me to code the way I thought it should be coded without being lazy and taking shortcuts.

Sorry for the long post…

Peter Palludan on October 31, 2007 3:19 AM

The "language-like flow", in my opinion, can be made to work.
It's the examples you gave which are problematic, because they replaced another language with just a bunch of function and method calls. While technically the goal of "language-like flow" is preserved, it is an extremely poor syntactic way of expressing another language.

There are examples which are in-between the solutions as a bunch of method calls and the solutions which actually modify the "outer" language syntax. They all have in common the following idea: hijack syntactic constructs of the "outer" language to represent operations in the "embedded" language; the syntax stays the same, the language-flow is not disrupted, while semantics is substantially different. The semantics of syntactic constructs used must surely be similar in spirit, using some metric, to the original semantics of the same constructs in the "outer" language, to prevent potential confusion. The degree of similarity may vary.
Of course, "a bunch of method calls" vaguely fits this description, but it really is an outlier.
A concrete example that I have in mind is how queries are done in Erlang's Mnesia database. It hijacks syntax used for list comprehensions. Anyone familiar with list comprehensions in Erlang has very little trouble mentally adjusting the same syntax to the database query semantics.
Using similar ideas, and inspired by Mnesia, I have created a Perl module, DBIx::Perlish, which similarly hijacks Perl's own syntax to represent database queries (which ultimately translate to SQL). The syntax is the one of Perl, the semantics is of a declarative query language.

Cheers,
Anton.

Grrrr on October 31, 2007 4:07 AM

The problem with 'real' language embedding as in the Perl and LINQ examples is that there are just too many domain-specific languages that you would like to include: SQL, Shell, XPath, LDAP, HQL, and regular expressions are just a few examples. So, what we really need is a language that allows you to include arbitrary domain-specific languages as special kinds of literals.

There has been some research on this in the meta-programming and code generation community: typically meta-programmers would like to manipulate/generate programs using the concrete syntax of the language they generate, and have some basic guarantees that the generated program is syntactically correct or even compiles. This are of research is known as "meta-programming with concrete object syntax" or "statically safe program generation".

We have extended those approaches recently to mainstream programming. In particular, the article "Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach" discusses a generic approach for extending a language with arbitrary domain-specific languages. The main goal of that paper was to illustrate how one can prevent injection attacks.

Martin Bravenboer on October 31, 2007 4:34 AM

I can't speak to the usefulness of LINQ, since my company has steadfastly resisted the move to .NET 2.0 (yet, though I'm pushing). But I can comment on the usefulness of fluent interfaces, since NValidate uses them.

The point of a fluent interface should be to help a developer write LESS code that is CLEARER and EASIER TO MAINTAIN. If the end result of your API is more verbose than what they had to begin with, you've failed to achieve your objective.

A fluent interface brings to the table the ability to exploit intellisense, and (in the case of NValidate) to reduce many hard-to-read lines of code to one readable line of code. If that's not happening, it's time to go back to the drawing board. Chances are pretty good that no one's going to want to use the product, and that those who *do* use the product will be those who've had it foisted upon them. In those cases, there's bound to be lots of grumbling.

Mike Hofer on October 31, 2007 4:41 AM

"Ahhh. Remember Cobol: Subtract X From Y Giving Z"

I was thinking that. Of course, in some (can't speak for all, I only ever COBOLed on IBM and VAX) implementations we could say:

COMPUTE Z = Y - X

I look forward to getting and playing with LINQ and all the other fun-looking stuff. I'm particularly relishing putting together some more complex queries and seeing how well-optimised they are at the other end. I'm like that.

Mike Woodhouse on October 31, 2007 5:14 AM

Thank you for bringing this up. When I was in school, one teacher was expounding on the virtures of ob-oriented rendering. While I "got" the idea behind it, I still couldn't understand why it was better than using simple string manipulation.

Ben on October 31, 2007 5:21 AM

I wonder -- how deep could we nest languages? Maybe some C# code using SQL to store javascript in a database that includes a regular expression?

Joel Coehoorn on October 31, 2007 6:28 AM

While I can take or leave fluent interfaces, I think that this post unfairly represents the motivations for making query objects instead of strings. If you are successful (and, yes, that is very hard), you can have queries you can *reason* about. You can create an algebra for your objects and combine them in interesting ways, build higher-level abstractions and so forth.

I will agree that the vast majority of attempts to do this fail miserably and provide no value over embedding straight SQL as a string. Still, when somebody has their domain figured out and they have managed to construct an abstraction that allows the programmer to do *more* than before (e.g., build user-driven queries at runtime without making StringBuilder your best friend), I think it is a great success.

The point is never whether you can build an object-oriented design that does something trivial -- it is what can that design allow you to do that the grungy text version couldn't... If the answer is "nothing," then I agree that the whole exercise was futile. I'm just pointing out that it doesn't have to be so.

Talking of misconceptions, @Louis Kessler, slagging APL on the basis of whether it can be compiled or not is a dangerous game: virtualy all of the ancient programming language esoterica that newbies love to disparage because they can't imagine an efficient implementation (think Lisp, Smalltalk, APL, etc.)... Actually have *very* efficient implementations. The fact that it didn't come in the box with Visual Studio does not mean it doesn't exist!

Carl Irving on October 31, 2007 6:29 AM

Good point for writing small apps. Wait until you've written the same query in who knows how many points in your application, then we'll talk... Oh, and wait until you've changed your DB and have to go back to all those places and change your query... Good luck!

CptBongue on October 31, 2007 6:30 AM

"In The Land of Strings, we speak regular expressions. In The Land of Data, we speak SQL."

Haha, I love this quote. Mind if I steal it?

LINQ is awesome. I've been all over that since the first CTP, and now that Orcas actually provides Intellisense for it, I can say pretty confidently that it blows every other ORM tool out of the water. I do wish it had slightly better support for the UD in CRUD, and it would be nice if XLinq was as clean in C# as it is in VB, but eh, nothing's perfect.

And as Eric says, you can use these SQL-like operations on any IEnumerable. So if - for example - you're getting data from a web service instead of a database, you can treat it almost exactly the same way. That is a true "fluent interface", making PROPER use of object-orientation to make an actual abstraction rather than a useless wrapper. Although I suppose it's a little easier to do this when you also make the compiler. :-)

Aaron G on October 31, 2007 6:31 AM

>>Man, I'd give a lot to have Regex literals in C#. Regex literals are easily my favorite part of Perl or Ruby, and I cringe every time I have to go through the whole "new Regex" dance.

Of course, with .NET 3.5s extension methods, you will be able to write your own regex method on strings so you could just say "foo".Matches("yourregexphere")

Tim C on October 31, 2007 6:56 AM

"...code that looks like exactly what it does."

And that's the bottom line. Code is an arbitrary interface to the system. Nothing more. Nothing less. It can help or hinder.

Regex, whatever else one things about it, fails here. Perl too. Like VB-ish languages, SQL is just English-like enough to do the job, but not too verbose to get in the way. Following the "code that looks like exactly what it does" standard, C-like languages would get lower marks than something like VB, PHP, Ruby, etc.

(Newsflash: braces and semicolons are NOT inherently superior to any other form of delimiter)

Ian on October 31, 2007 7:00 AM

The Linq example is unfair, since the compiler DOES turn the embedded sql syntax back into the object soup you so hate and calls that.

You can even write the object soup directly if you desire.

I'm also betting people unfamiliar with RegEx will understand the wordy multi-line example long before they even figure out that the alphabet-symbol soup is anything other than a head roll on the keyboard.

Xepol on October 31, 2007 7:12 AM

SQL and "regular expressions" are similar because they are both declarative languages. You describe the result you're looking for, not the steps to get it.

PowerBuilder has supported SQL directly in the language for more than 10 years, with type checking and without passing strings of SQL statements (like just about everything else I've seen, other than the LINQ examples above--I'm going to have to look into LINQ). You have procedural access to results in the PowerScript language, and if you want object-notation access to the result instead, use a DataStore/DataWindow.

Strangely, PowerBuilder has had regular expression support in the editor for quite some time, but it is not available in the PowerScript language (unless PB11 has it--I haven't used it yet).

--dang, former Certified PowerBuilder Instructor, circa 1993

Daniel 'Dang' Griffith on October 31, 2007 7:23 AM

I agree with some of what you say here but I think you miss a few points.

Object models that represent domain specific languages (for example your sql example or the CodeDom) have two distinct benefiets over simply writing the code itself.

1.) It allows it to be crafted in a implementation agnostic way. Meaning you could build up your objects then convert it into MySql, SqlServer or Oracle SQL quite easily and for the CodeDom you get VB.NET, C#, F# or whatever.

2.) Dynamic generation. No one would ever write all of their classes with the CodeDom for the exact reasons you specified above it would be stupid. However if you need to dynamically generate SQL or a class based on some meta data then using an object model to do it is a really big help.

But I agree with most of your criticizms in general and would add that there are additional problems related to ORM, where you map objects to a relational model but that is probably beyond the scope of just this blog post.

You might be interested in reading about NBusiness however, which is an interesting solution to the problems you're describing. For example:

http://www.codeplex.com/NBusiness/Wiki/View.aspx?title=A%20simple%20entity&referringTitle=Home

It uses a language called E# to allow you to define your business layer and uses templates to author much of the code and SQL. So in this case you have both the generated SQL/objects you are railing against AND a succinct domain specific language that you are using to declare it. What do you think about that??

You might try reading about "Intentional Programming" also if you want to stretch your brain for the day. It has some interesting ideas that I interpret as dynamic layers of domain specific languages.

Justin Chase on October 31, 2007 7:27 AM

What about TCL?

I've worked with a lot of scripting languages, using regex and SQL,
blending sh, awk, perl, etc....
but never felt the 'language within a language' as much as when I dipped into Expect with TCL as it's embedded language.

It wasn't terribly difficult to learn, but having to learn all the gotchas and caveats of a new language was a pain (unbuffered input coming in unpredictable chunks based purely on timing for example).

To this day I dont know how to feed an array of args to 'spawn' and get it to use them as separate args.... I had to auto-gen a new expect script instead whenever I wanted to feed a pile of args to ssh.

I know that loading an expect perl module from CPAN would have helped,
but I had already coded 95% of the functionality in TCL before I entertained that option. I was of the position that expect was built to handle the interaction, whereas the perl module seemed less mature (at the time). I also never saw the need to put everything in the same language.

Perhaps this is a case where it would have paid off to keep in all in perl.

Eric on October 31, 2007 7:45 AM

Good post. For years I've been saying something very similar. I've noticed more and more layers between typing code in an IDE and actually querying data. I've always said it seems like people are more and more afraid to actually do a query.

As usual however, you said it much better.

Brent on October 31, 2007 7:50 AM

Jeff,

I think Jon Galloway makes good points. If I'm not mistaken, in the .NET world, SQL is best handled by leaving the queries on your SQL Server and using parameter calls to prevent injection for best security and performance. In the land of Java, Hibernate uses HQL (a bastard version of SQL) that is server agnostic and currently the defacto standard.

As for fluent languages, I've read a bit of Fowler and I think he pushes for Domain Specific Languages as a whole and his argument generally points toward the _business_ (or domain) that your computer language isn't made for. If you're handling invoicing, you don't have a language command for 'invoice.generate()' or GENERATE INVOICE or whatever. Martin has been playing lately with products like ANTLR and some other compiler-compilers to add domain specific (or business specific) keywords to the language that makes a 'language within a language' for an application. He argues that it makes the development of the application fit the model and development much more rapid. If one doesn't customize the actual structure of the language, then you can develop the API set. But his point is about the nomenclature that is used amongst developers, customers, consultants and analysts that must be specific to the domain and that there are no languages (regular expressions, SQL, Java, C#, Ruby) that speak Mining, Accounts Payable, NASA, or whatever.

I don't know if I'm sold on his ideas, but for the moment, I'll evangelize for Mr. Fowler. More on his blog....
http://martinfowler.com/bliki/DomainSpecificLanguage.html
http://martinfowler.com/articles/languageWorkbench.html
http://en.wikipedia.org/wiki/Domain-specific_programming_language

Garret on October 31, 2007 7:56 AM

I really don't think its fair to offer up my experimental (I've never used it in production code) ReadableRegex as a representative of fluent interfaces as whole. I was attempting to solve a well-acknowledged problem of regexes being hard to read. If I did not succeed in this instance, it is no reason to dismiss fluent interfaces for other applications.

Try looking at the Rhino Mocks tool which I refer to in my post. Compare its fluent interface to what the calling code would look like if it did not use a fluent interface.

Joshua Flanagan on October 31, 2007 8:31 AM

I'm surprised that no one has talked about the fact that SQL itself is just another abstraction. Databases don't naturally "speak SQL". SQL was designed as an abstraction on top of what used to be proprietary libraries for getting at data. It is a "non-standard standard" of sorts. For example, Progress has its own proprietary language for getting at data and only supports SQL as an afterthought in my opinion.

So it is possible that some day LINQ may be considered the most common way to get at database data and some blogger will be ranting about how the technologies after LINQ are just embedding the LINQ "language inside a language".

Matt on October 31, 2007 8:44 AM

Maybe it's just a phase wanting to use the massively-long-object-oriented stuff, I certainly went through something similar. Also, regarding regular expressions, I've found so many people to be so unimaginably fearful of parsing in general that it doesn't surprise me they'd want something with a little more cush.

Neil C. Obremski on October 31, 2007 8:53 AM

Wow! You use a Perl example.

I do most of my programming in Perl and have heard all the arguments against it. This includes "It's not object oriented." and "It's unreadable."

Some of its reputation of unreadability comes from contests where people purposefully write obfuscated code (my favorite example is http://99-bottles-of-beer.net/language-perl-737.html). However, most of Perl's reputation for unreadability comes from handling regular expressions.

However, Perl's ability to handle regular expressions is unsurpassed. I've switched to Ruby and PHP recently, and I find their regex syntax handling almost impossible to read. The example you gave is even worse.

People also complain that Perl is not a "real" object oriented programming language, but merely pastes object oriented techniques on a functional oriented language. So what? OOP is a technique and can be used in almost any programming language (I think APL is probably the lone exception). Some languages have built in support for OOP, but that doesn't mean you can't write good OOP style code in a language that doesn't include OOP syntax.

The other thing I have to point out is that OOP is not a religion, but a tool. If writing OOP code makes your code more readable and more easily maintainable (as it usually does), then go for it. However, if you're just wrapping everything in object classes just to make it OOP, then you are probably making the program harder to maintain, slower, and harder to read.

I worked on one project where the developers spent almost 2 years designing all the objects that the program would use. The project became obsolete and was canceled before a single line of usable code was written.

David on October 31, 2007 9:05 AM

Jeff,

Never forget that SQL itself is merely another example of what LINQ is doing - it embeds the Relational Calculus in a data management language. And Codd is on record that it does it badly.

Ross Patterson on October 31, 2007 9:10 AM

Jeff,

If you take the specific syntax out, it sounds like you're advocating Lisp. :) The trouble with Lisp, though, is its total lack of syntactic cues.

A language with conventional syntactic cues that allowed you to write mini-languages within it - and those with their own syntactic cues - would be a very cool thing. Does anybody know of one?

trousercuit on October 31, 2007 9:54 AM

Your post implies that Fluent Interfaces and APIs to abstract out interfaces to other systems are the same thing. Your real beef seems to be against APIs that are more complicated than the complication they are supposedly abstracting which really doesn't have anything to do with Fluent interfaces.

Fluent interfaces are just an API design methodology and could be applied to any API not just those that happen to already have a Domain Specific Language written for them. A good example that was mentioned above is much of the stuff in Rhino Mocks.

I think it should surprise no one that a Domain Specific Language that was designed to address a particular domain(Relational Databases or String Matching) are easier to use and more expressive than an interface that was built on top of and with in the constraints of a general purpose programing language.

Nick Goede on October 31, 2007 9:55 AM

Jeff,

I agree with your example on regular expressions. The regex explosion is just hideous, although as one commenter noted it is quite easy to figure out what is going on for the non-regex fluent. On the other hand, I am a fan of ORM.

In either case, I don't think these two indict fluent interfaces as a whole.

Sammy Larbi on October 31, 2007 10:02 AM

I'm perpetually disappointed (read: irritated) by all these fluffy abstraction layers that seem particularly dominant in the Free Software / Open Source worlds. We have gobs of processing horsepower at our disposal, yet we keep creating more work for the developer rather than the machine. I already know a couple dozen languages, why should I be learning a hundred different DB interfaces ?

What I like to see is "invisible magic", the kind of function/object that will take generic input and do whatever it takes to make it work on the platform. Say the app is designed on MySQL, but some unfortunate chap tries to run it on SQL Server (or PG), why can't there be some nice glue code that adapts the SQL string and/or return values to be consistent on any backend ? We have DB abstraction libs, but they all present their own slightly different interface; well if someone went through the hassle of designing a new interface, surely they could have had the time to write an SQL parser that handles legacy queries with a smile.

I think the root of the problem is that many people do OOP, but very few people actually think in high-level constructs, they just use OO everywhere without the slightest iota of applied logic. Me, I like a program that acts like a good assistant: you give it a simple job, and it comes back with the result without having to teach it every stupid little thing. That's why my apps have imperative-style funcs like "GetList('employees')". I don't even want to see "select * from employees" unless I'm explicitly writing that utility function, and I should have to do that only once.

In the end, almost all business apps do more or less the same thing, the entities have different names and relationships, but the bird's eye view is the same: Get data, get related records, sort and print, with an update thrown in once in a while. That sort of thing shouldn't need a half-dozen mini-languages to get the job done.

Billco on October 31, 2007 10:08 AM

I've tried abstracting my SQL data into some self-made classes. It turned out to be more pain than it's worth.

If you're going to abstract SQL, it should be for portability or DRY reasons, not because you don't know how to write SQL. If you can't write an SQL statement then you need to go back to whatever 2-year IT school pumped you out and ask for your money back.

Mattkins on October 31, 2007 10:18 AM

I can see a need for abstraction of the database for some solutions as they will need to be database agnostic, but as as whole, the more abstractions the harder the code will be to debug and follow, even if it is in "API" like syntax. And, it will be *slower*. Plus it is very easy to implement different database versions with an interface and implementations of the interface.

Interface GenDB
Abstract GetCustomers()
Abstract GetSales()

SQLServer : GenDB
GetCustomers()
//SQL Server SQL to Get Customers

Oracle : GenDB
GetCustomers()
//Oracle SQL to Get Customers

Pretty simple. Add another DB, implement the interface.

If I use SQL, I can debug the code to where the SQL is being generated and cut and paste the SQL in to a query analyzer (toad, etc) type of tool and check to make sure the SQL is working as expected. LINQ, where's the SQL? It's in LINQ somewhere, so now another layer to figure out where the problem is. The problem will be that everyone will start using LINQ instead of just issueing a query to the database.

There definately seems to be an "abtraction movement" to generalize everything, but in my experience abstractions can lead to a very unclear picture to what is *really going* on in the code. Some of us would like to know what is going on rather than relying on faith of an API call.

The dumbing down of programmers continues. Programmers of the future will not be known as software engineers, thier titles will be "object abstraction monkeys".

Jon Raynor on October 31, 2007 10:18 AM

Interesting post, but I disagree in many respects. I like the idea of combining the languages into one clean coherent statement structure, but I think it's ridiculous to say everyone should know all these other languages. Ideally it's nice but not exactly practical. This argument reminds me of the old cliche instance of some programmer saying "real programmers use vi/notepad and don't need those fancy editors". Just because I want to write an application that talks to a database, should not require me to know the inner workings of that database or how it communicates. This is exactly the point of object wrappers, to abstract such details. The point another poster made about differences in SQL is especially relevant. I will agree that it is good practice to know how all these different things work, but realistically that is not most programmers out there, and as systems become more complex it will become more and more rare. As far as regexps go, you are one of the few people I know that think these things are simple. They are powerful and I love them for that, but I also hate them. For what they accomplish they are a pain in the butt and very much a voodoo art that those familiar with them will often describe as "simple", however, there are many developers out there that look at regexps and think "wtf?". I'm glad to have regexps at my disposal and they are useful but they could stand to be significantly improved. Just because you are familiar and comfortable with a cryptic syntax, does not make it a good one.

Thad on October 31, 2007 10:39 AM

"If only someone could fluently resolve the mess of PHP, CSS, HTML and Javascript (*cringe*) that form a webpage..."

It's been done ... Curl http://www.curl.com/products_platform.php
....But it wasn't adopted because no-one supported it

Jaster on October 31, 2007 10:41 AM

I guess they don't teach lex and yacc anymore.
When was the last time you saw a cs grad that was able to write a parser, or a compiler?
Personally, I've written a bunch of parsers, some based on lex and yacc, some not...

I think that a thorough understanding of this subject should be a REQUIREMENT for every cs degree.

Maybe then we can get rid of this silly attempt to dumb-down code for things like SQL or hypertext...

Mac on October 31, 2007 11:23 AM

All the arguments about regex I have no problems with, use regex's the way they were meant to I see very little argument for abstracting them.

However I am really stuck on an abstracted db layer because without you cannot develop a truely agile application. SQL can and should be broken down into its simpler concepts so that the code can be reused, also you should not under estimate the benefits of projects like sql alchemy (python), qcodo (php), propel(php). They save developers a significant amount of time and as far as I have seen sometimes evenn write better SQL than most people because thats what is was made to do.

Andrew R on October 31, 2007 11:32 AM

I realize this is a general post on how great it is to throw untyped strings into your code, but I it's worth pointing out reason for the bias many people have against a data access layer: you tried one before and you didn't like it. It didn't add value, it got in the way, and you ended up wishing you could just get at your data instead of messing with some wacky wrapper. Your data access layer caused extra work, and didn't really help.

Here's something you might not have considered: it's not data access layers, it's you. You screwed up - you built or picked a dumb data access layer, then decided data access layers are bad. Nope, it's you.

http://blog.wekeroad.com/2007/10/09/unleashing-elmer-fudd/

Jon Galloway on October 31, 2007 12:00 PM

"I was attempting to solve a well-acknowledged problem of regexes being hard to read." -- Joshua Flanagan

That's a "well-acknowledged" problem? Not by me. Here's a "well-acknowledged" problem:

Working with something you don't understand is hard.

Having trouble reading a reg-ex? Here's a tip: learn reg-ex syntax. Problem solved.

Steve on October 31, 2007 12:04 PM

Steve - if your point is "I'm a better programmer than you because I can instantly grok a regex string", then you win.
My point was never that I do not understand regexes or how to read them. Don't you think I would have had to learn reg-ex syntax as a pre-requisite to writing the fluent interface wrapper? It was never intended as a wrapper to make it easier for "lesser" programmers to create and read regular expressions without having to learn regex syntax first. I agree - everyone should learn it.
My goal was trying to make it easier to read. I can read any regex you throw at me, given enough time. I do not have the ability to instantly translate all of the 1/2 character symbols into the language that I think in. If you can, this library was never intended for you.

Joshua Flanagan on October 31, 2007 12:31 PM

Maybe I'm missing something listening to Jeff up on his soap box again, but isn't the entire purpose behind using Objects/APIs the downline portability of it?

One of the things I personally have recently noticed going from SQL 2000 to SQL 2005, is that not all of my queries work the same way. If I have hard coded my SQL all over the place I have many points of potential failure. However, if I utlize an object based API then I should be able to update the API to ensure it works with SQL 2005 and not have to change all my SQL statements throughout my code.

Isn't that the entire purpose of OO programming?

AlwaysRight on October 31, 2007 12:34 PM

Wow, I should have re-read that last comment before posting it. It comes across a lot "snippier" than I like. Here's the same comment without the attitude:
I agree that programmers that are using regular expressions should learn regular expression syntax.
I do not agree that solves the problem of them being hard to read.

Joshua Flanagan on October 31, 2007 12:43 PM

I wasn't questioning your ability to read reg-ex's, just making the point that some things need to be understood in order to be used. I've met people who used GUI SQL builders because they didn't understand SQL. The result? Garbage. One guy thought that if he got a result, any result, the SQL call was correct. Of course it wasn't even close because he didn't understand how a "group by" worked. Programmers need to understand code. People who don't understand code shouldn't be programmers. Code isn't easy to read, that's why it's called code. ;)

Steve on October 31, 2007 12:52 PM

Jeff,

There is a difference between adding features to a language or "ObjectJunk" libraries.

If you add native SQL features to the core language, then the core language gets polluted with all these specialized keywords, and you can throw backward compatibility out the door, if code written for language 2.5 is not compatible with prior releases. And if you continue the trend with native features, you will get to a point where namespace gets polluted...e.g. orderby keyword referring to a list or a database?

However if you added SQL features to an "ObjectJunk" library, you should be able to use the library with older versions of the languages (provided the library itself can run on older language releases!).

Language designers are always fighting the battle between native vs library.

Kashif Shaikh on October 31, 2007 12:57 PM

Another point Jeff,

Using domain-specific languages is always a good thing. This is the reason why we see languages like Erlang are touted more for concurrency programming than hacks like OpenMP.

The problem of domain-specific languages is trying to integrate them with core languages like Java, C/C++, etc. Which is always a pain.

Kashif Shaikh on October 31, 2007 1:07 PM

I agree with the examples given, but I think Rails (and to some extent Ruby itself) definitely gets this fluency thing right.
It is much nicer to write:
if my_date < 2.days_from_now
than it is to do the equivalent in Java.
Even little things like my_array.first and my_array.last are much more readable coming into a project than myArray[0] and myArray[myArray.size()-1]

Matt V on October 31, 2007 1:07 PM

Funny, I look at the examples given and come to the opposite conclusion.

First show a regex expression which is completely unreadable to anyone not versed in regex's, compared it to a test interface that makes it into something readable by any competent programmer.

Then you show a trivial line of sql code (yet still so non portable that won't work on any of the databases I use) and compare it to a more verbose interface that will at least work, and return data to the program.

If you want to argue against fluent interfaces, at least pick a good example. How about a sql example using user entered parameters running against an arbitrary database. Once you start doing that, your one line of sql starts becoming several lines of conditionals, parameters, and concatenations. And you are still embedding sql in the middle of your app.

Linq is nice, but it is .net only. Those of us who aren't writing our own compilers can't add Linq to our language de jour. We can however add fluent interfaces.

Your post has inspired me though, I am going to add a fluent interface to my favourite persistance layer.


Sean on October 31, 2007 1:18 PM

<joke>
jeff's juss jealous that rob got assimilated and he didn't.
</joke>

all very good and clever points. replacing regex like that is just a brutal crime. to paraphrase Justice Gray:

"Regular Expression don't suck -- but *you* do."

lb on October 31, 2007 6:32 PM

People seem to have missed one of the most used (and IMHO very useful) fluent interfaces... C++'s iostream <<. Who among us is not familiar with the code:

cout << "Hello World." << endl;

Fluent interfaces have their place, as do embedded languages. Though I have to admit that I've seen some JSP code (and PHP for that matter) that would cause any programmer to run away screaming. It's the constant switching between languages (HTML and Java) that can really get you.

Carleton on October 31, 2007 6:51 PM

I understand the sentiment behind your statements, but in practice, I totally disagree. As far as I can see, your main point is something along the lines that programming should be as simple as possible and that we are just overcomplicating things with all of this object model / code generation stuff. Why should we use all of these objects when we can just read the data into a simple table-like structure (a DataReader) and go at it from there?

The problem is that "going at it from there" is the most difficult part of all. In your SQL example, you totally left that out. The Subsonic example feeds the data into a strongly typed collection that can be easily manipluated via the object model. This means you can do things like traverse tables, rows and fields by simply iterating through collections of objects. In many cases, you can make one initial call for data, and then get to all related data (in related tables) just by traversing the object graph. Try to do that with simple select statements and a DataReader.

I have been using an ORM consitently for over 2 years, and I cannot live with out it anymore. I use LLBLGen, which I personally like a lot, but any good ORM will do. Here's one simple example of why an ORM can be so much better than hand crafted SQL and DataReaders:

int customerId = 123;
CustomerEntity customer = new CustomerEntity(customerId);
Console.WriteLine(customer.Name);

foreach (OrderEntity order in customer.Orders)
{
Console.WriteLine(order.OrderNum);
}

In just that small amount of code, I was able to pull up a customer record and print their order info. Hand crafted SQL would make all of that take SOOOO much longer. Sure, ORM's have their limitations, but working around them is well worth it. The biggest issue is that learning an ORM to begin with is kinda hard (like learning regex's), so people tend to dismiss them as being "too complicated". Sure, they can be, but not always, and the ones that are well written make programming way easier, not more difficult.

Jeremy

Jeremy on October 31, 2007 9:58 PM

I'm with you Jeff.

In fact, I've gone so far as to post a complete re-working of Joshua's example. The re-worked example is concise and readable, but NOT fluent.

I've also proposed an explaination for why the fluent style may be necessary in Java, but not in C#.

John Rusk on November 1, 2007 12:10 AM

The "sql is different across platforms" argument is pure nonsense. If you stick to the ANSI syntax it will work across MySQL/Oracle/DB2 and most of the others.

If you use something like Oracle avoid decode() and the (+) operator for outer joins - the standard syntax will work.

Stick to the standards and you'll be fine.

I code in Ruby on Rails these days and use ActiveRecord for most of my database needs. Interestingly, the design goal for this was to make the simple stuff really easy (finding things, getting sets of child records etc.). But you can still put raw SQL in for more complex tasks that can't be expressed in Object crud. This works really well.

Ruby's metaprogramming makes it really easy to implement ActiveRecord, it has a method_missing method that allows it to intercept things like

Person.find_by_first_and_last_name "fred", "smith"

and turn it into SQL (and also add this method into the class on the way so there isn't any runtime penalty after the first invocation). You can write really fluent-looking code that reads well, and also dive into SQL for grouping and so on (or proprietary hacks) if you must.

I can't go back to Java and have never had to work with its child C#. I can just write code that reads like English. If I were going back I'd use Hibernate though - again it does the 95% really well and you don't have to care about your database engine.

Regular expressions are part of the Ruby language and are first-class members of the object hierarchy. Working with them is really easy and they have the standard syntax. Again ... Java ... C# ... shudder.

Francis Fish on November 1, 2007 1:24 AM

I am not advocating the "one right way" to do anything. Use whatever approach you feel works best for what you're doing.

This post is meant to incite discussion and thought on the topic. Consider what you're doing, why you're doing it, and what the alternatives really are. E.g., if you really need to talk to 5 different databases, then by all means, use an ORM that can help you do that. But also think about the limitations of your toolset and why those limitations exist. Don't blindly accept the status quo because your toolset forces you to.

I believe in the inherent power and flexibility of domain specific languages, and that's exactly what SQL (for data) and Regex (for strings) are. Every developer should be conversant in these languages. The idea that all code should be in one, and only one, language is obsolete. The world is now a patois of different domain-specific languages. Don't fight it with object wrappers. Learn to embrace it and love it.

Jeff Atwood on November 1, 2007 2:47 AM

I think it's funny that so many people wrap SQL in the name of "database independence".

"Database independence" is the reason SQL exists.

All you're doing is abstracting the abstraction.

Target SQL92 and it'll work in a wide variety of databases.

Sean on November 1, 2007 10:35 AM

@Jeff,
I understand the need to incite discussion about this, however, I don't understand why I chose to attack SubSonic. However you may wrap it and explain it, that's what it is. Your example can be represented in one line. In fact, I easily identified with it, because it's what Ruby and Rails do almost everywhere. I liked it, just for that reason.

So, why SubSonic? Why hang it out like that?

Srdjan on November 1, 2007 3:12 PM

Hi Jeff,

I've been coding for 10 years and have stubbornly kept RegExs at arms length - dealing with them clumsily and only when I needed to.

You've put forward a great arguemtn and I've really started to 'get' RegExs. Indeed I spotted this article which demonstrates commetnign of RegExes, which I didn't know of. Yes it's verbose, and yes I can now read the whole Regex without it now, but it looks like a great way of leaving a trail for others to follow. Anyhoo... http://msmvps.com/blogs/jon.skeet/archive/2007/11/02/i-love-linq-simplifying-a-tedious-task.aspx

Ian Pender on November 2, 2007 6:06 AM

Quote: I would argue that most of these benefits could be realized with smarter IDEs that actually understood native SQL strings (or regular expressions)

Well, gvim syntax highlighting for PHP can highlight regular expressions. I think it doesn't highlight SQL (yet). Better than a slow bloated IDE.

Nicolas on November 2, 2007 7:16 AM

It seems, that people are continuing invent new approaches...
forgetting (and not using) the best from old ones.

Maksym Shostak on November 2, 2007 9:28 AM

"If I'm not mistaken, in the .NET world, SQL is best handled by leaving the queries on your SQL Server and using parameter calls to prevent injection for best security and performance."

This has not been the case unless you are using MS-SQL Server predating MS-SQL 2000 in a client/server environment.
Using stored procedures still leaves you open to SQL injection, and stored procedures are no faster then the other SQL commands. Actually in the way most people use stored procedures for CRUD they are slower than parameterized queries because of all the coalesce or isnull are very very slow.

will dieterich on November 2, 2007 10:06 AM

For another way of using the power of your programming language to produce SQL that is known to be valid at compile time, take a look at this solution in OCaml:

http://eigenclass.org/hiki/addressing-orm-problem-typed-relational-algebra

Matijs van Zuijlen on November 3, 2007 2:51 AM

Fluent Interfaces are adaptation to languages to make them 'feel better'. SQL, RegExp and LINQ are Domain Specific Languages and they're not directly related to Fluent Interfaces.

By using a fluent interface you give up your language syntax a bit to get something more readable. By using a DSL you use a language specialized on the problem being solved.

The problems of getting a nice DSL and transforming it on tons of lines of a General purpose Language are not related to Fluent Interfaces.

Phillip Calado "Shoes" on November 4, 2007 11:29 AM

"First show a regex expression which is completely unreadable to anyone not versed in regex's, compared it to a test interface that makes it into something readable by any competent programmer."

Two mistakes. A competent programmer nowadays know regepxs. And it is much more efficient to make a practical solution than let everybody wade through something that verbose. (For educational purposes write a regexp-to-verbose-objects converter and use that to understand the code.)

Or just imagine how a language would look like if it were designed for competent programmers that are just not well versed in OO and where every method invocation would explain what it does...shudder.

"...your one line of sql starts becoming several lines of conditionals, parameters, and concatenations."

Not with proper interpolation at hand. Unfortunately, user-defined string interpolation doesn't exist and can't be implemented in most current languages.

"And you are still embedding sql in the middle of your app."

So what? Code is code. :-)

Andreas Krey on November 4, 2007 12:38 PM

"But in my professional opinion, that regex was a well written one."

Even if it matches things that it obviously shouldn't match, like <divclass="game"id=...>, uhm? (and it uses grouping constructs without any useful purpose besides making it look more complicated than it actually is)

marcelo on November 6, 2007 2:57 PM

The problem with wrapper code of any kind is that it HAS TO BE TRANSPARENT. If you loose some of the capability of the base language, or if it introduces new bugs and complexity that the base language didn't have then it's a complete failure.

My personal pet peeve is slightly differing regexp syntaxes between programs. Especially you, emacs.

engtech on November 9, 2007 8:56 PM

I was thinking about this and had to search for this post but surely generics would count as a language in a language? It's like a preprocessor language and pretty tidyly implemented (in C# atleast).

Scott on March 13, 2008 1:23 AM

I think the point of frameworks like SubSonic is not to save lines of code in how I write a SQL Sentence, but in how I express the sentence in an environment that will handle the "Impedance Mismatch" between the SQL World and the objects world, saving code.

These two lines of code:
--------------
SELECT * from Customers WHERE Country = "USA"
ORDER BY CompanyName
--------------

will not handle the way I move the results to an object world. (If necessary). Connect to the DB (hopefully abstracted of the DB Engine), query the DB, get the result set, disconnect and parse it in objects.

These four lines of code will:
--------------
CustomerCollection c = new CustomerCollection();
c.Where(Customer.Columns.Country, "USA");
c.OrderByAsc(Customer.Columns.CompanyName);
c.Load();
--------------


I agree that LINQ to SQL is much more clean and readable, but it doesn't mean that this comment is valid:
"why in the world would I want to use four lines of code instead of one?"

It only misleads.

grumlin on September 4, 2008 6:42 PM
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.