Wednesday, July 18, 2007

C++/CLI is disgusting

Microsoft has found a truly awful set of syntax and semantics for their new C++/CLI language, formerly known as "Managed Extensions for C++". They decided that the old syntax was ugly because it used keywords that began with double underscores (which is a standard way to add compiler extensions in C++). Unfortunately, their solution was much worse than the problem they were trying to solve.

I had been using the first Managed C++ for a little while, but luckily I only made a single module in it (wrapper classes to allow C# to access some C++ classes). After a couple years I wanted to add a dialog box that accessed the C++ classes directly, but the forms designer only supported the "new" syntax; worse, Microsoft requires that the entire project only use one syntax or the other. So I learned the awfulness of the new design as I laboriously converted each line of the old code to the new syntax; the new syntax is so different that virtually every line of the module's header file had to be changed. And not just slightly. In many cases it was faster to retype the line than to try to adjust it. And they didn't just make new syntax, they invented new problematic semantics as well.

The changes include
  • The new "handles". A pointer to a managed class used to be called MyClass*, now it's MyClass^. Other than that they are still used like pointers (i.e. with the arrow notation).
  • "Tracking references". Instead of writing String^& and Int32& like you would expect, you have to write String^% and Int32%.
  • nullptr. Whereas you used to be able to initialize all pointers to NULL, including managed pointers, now you have to remember if it's a managed class and use "nullptr" if so.
  • Same with 'new'; now you have to write gcnew if the class is managed.
  • New finalizer syntax. Confusingly, whereas C# and Managed C++ use "~ClassName" for finalizers, Microsoft decided it was too predictable and renamed it to "!ClassName". Worse, they now require your Dispose() function to be called ~ClassName(), which causes a silent semantics change in old code. Or it would, except that you'll know something's up because your Dispose() method yields this odd error: "'Dispose' : this method is reserved within a managed class".
  • You can no longer use a managed enum like you do a normal enum; you have to qualify the names with "EnumName::EnumValue". This makes it impossible to share an enum between C# and standard C++ code, so you have to create a second enum (with the same items) and convert between them all the time.
  • In a managed class you must say if you're overriding a base class function or not, or you'll get a compiler error--whereas in pure standard C++ you can't. Argh! Even C# lets you off with a warning. And what a bizarre syntax they've picked too; instead of grouping the "override" keyword with "static", "virtual", etc., they make you put it at the end: virtual void foo() override {}. What's more, you have to specify both virtual and override.
  • Similarly, "sealed" and "abstract" go after the class name.
  • When making managed properties, you now have to group the setter with the getter in a single construct like in C#, but unlike in C#, you also have to repeat the data type three times (or twice if it's just a getter). How many times do you want to type Dictionary<string,SomeFreakyLongClassName>?
  • CLR enums are no longer implicitly convertable to arithmetic types.
  • They've switched to the standard 'typeid' syntax instead of __typeof(MyManagedType). Oh wait, no they haven't! The syntax is randomly different: MyManagedType::typeid versus typeid(UnmanagedType).
  • What the hell were they thinking here?
    virtual Object^ InterfaceClone() = ICloneable::Clone;

    The old syntax for 'explicit interface implementation' made much more sense:
    Object* ICloneable::Clone();
Tell me, how is it that when C# is supposedly modeled after C++, the C++ version of all these .NET features ends up looking so much longer different than C#?

Admittedly, there are a few things that don't suck, like
  • the support for normal C++-style operator overloading in managed classes.
  • implicit boxing (although if NULL is defined as 0, watch out for boxed zeros when converting old code)
  • default indexers (much like in C#)
  • trivial properties (but they're inflexible and so not usable in many scenarios)
And now some managed-style features work in unmanaged classes, such as properties. Personally I have no use for this. After all, using such features means you can't compile your unmanaged class in a non-.NET program, so their utility is limited. If I want to write a class that only works in .NET, I would almost always make it a "ref class" or "ref struct" so I can interoperate with other .NET languages.

There are two main problems I see with their design.

The first big problem is that they've forgotten the spirit of C++ and discarded longstanding rules of C++ such as implicit overriding. C++'s philosophy has long been that an object should be able to behave like a pointer, like a number, like a function. Smart pointers, iterators, fixed-point/matrix classes/bigints, functors. The ability of one thing to act like something else is the whole basis for the STL. But in Microsoft's new design, everything managed is completely segregated so you can no longer write code that doesn't care whether something is managed or not. It's not just reference types either; value types and even simple enums are segregated to an extent that they weren't before. You always have to think: Do I have to Qualify:: that enum or not? should I use gcnew or new here? NULL or nullptr? * or ^? & or %? I can only use one or the other in a given context, but the wretched compiler still makes me tell it what it wants to hear. Template code that before could have (theoretically) taken managed or unmanaged classes for arguments can now take only one or the other, because a separate syntax is needed for each.

The second big problem is that there is no longer anything I can share between C# and standard C++. I have a library that needs to be compiled into both C# programs and MFC programs (which must be Windows CE compatible, so mixing .NET and MFC is not an option). With the old syntax it was possible to share a small number of value types and enums between plain C++ and managed C++ (with the help of some #define macros); now I have to make two versions and convert between them.

If anything, Microsoft should have made the managed syntax more like standard C++, not less. It should have considered how to allow people to write classes that could be used directly from C# or (in another program) directly from standard C++. This would have made a much better bridge between unmanaged land and managed land. As it is, Microsoft has imposed a kind of syntax apartheid.

Bottom line: I loathe the new syntax. It makes me long for the hellish landscape of double underscores again.

Thursday, July 12, 2007

Goodbye, ANTLR

Three days ago, after finding workarounds for the ANTLR3 (C#) bugs detailed here, I immediately ran into even more bugs. For instance I had a rule that said

SL_COMMENT: '#' (~NEWLINE_CHAR)*;

Somehow the generated code for this rule included a check (during the matching stage, if I remember correctly) that said, in essense, "if the comment contains a slash character, generate a syntax error". What the hell? And there was another bug besides that which I've forgotten. My bug report on the first batch of bugs went mostly unacknowledged, so I didn't bother to try isolating this new problem.

Instead, I'm planning to try another approach: I'll make my own ANTLR. I bought the ANTLR book May 26, and I've been unable to get the thing to work for me since then. I'm getting impatient. I know how a LL parser generator should behave, so I ought to be able to make one... right?

Of course, I would like a parser generator done the Loyc way - as an extension to Loyc. But it'll be a little bit tricky to do this, because Loyc does not actually exist yet. It's still in the planning stages! There are no AST classes, no ONEP. So what will I do?

Well, my initial goal will be a translator from boo to boo. I'll make some AST classes and give them the ability to print themselves out as source code. Then I'll create a lexer and tree parser by hand; as for the main parser, I'm not sure how to approach it. But after I've done those things, I'll write some routines for printing out AST nodes as text. So it will be able to read source code and spit it back out.

At this point I've already written a lot of the lexer by hand. I've taken it as an opportunity to figure out how a parser generator should work, by attempting to write the lexer the way a machine would do it. I started by writing the lexer grammar in a hypothetical boo-style syntax; then I translated that grammar--mechanically, by hand--to C# source code.

There is so much work I have to do before I start making the parser generator, though. I fear that by the time I'm done with the prerequisites, I will have forgotten the lessons I'm now learning about making a parser generator. We'll see.

What's wrong with Java

When I see the features added recently to Java, I'm sure glad I'm using .NET, C# and boo. Even though Java is a lot older than .NET, .NET seems to get the good features first. Care in point: Generics. For a former C++ developer, it seems stupid to give up type-safe collections; I don't know how many years it took before Java got generics (10?) but .NET got them in much less time (less than 4 years, I do believe) and Java was left playing catch-up. In fact, most of the "new" features in Java 5.0 seem to be things that C# had from the beginning:
  • enhanced for loop (foreach in C#)
  • autoboxing/unboxing
  • enums
  • varargs (variable argument lists - "params" in C#)
  • annotations (much like .NET attributes)
Java generics aren't even supported by the JVM, so you get the same performance penalty from casting that you did before. I've always been unhappy with Java's performance (especially for GUI programs), whereas .NET just doesn't seem slow.

Look, Sun, if nothing short of competition from Microsoft can prompt you to improve Java, you must not care very much much about it.

Let's see, what else...
  • Value types (structs). This is a big one for me because can offer a big performance boost in many situations. You don't want to allocate a new object if that object contains nothing more than an integer and some methods, do you? A new object sucks up at least 16-20 bytes of memory even if it just contains one integer or reference; creating it requires multiple method calls and all those bytes have to be initialized. Useful value types include
    • A "Point" type that has X and Y coordinates
    • A "FixedInt" type that contains a fixed-point number (the language must support operator overloading to make it easy to use, of course.)
    • A "BigInt" type that contains a small integer normally, but allocates a memory block for a large integer if necessary.
    • A "Handle" type that contains an opaque reference to something else
    • A "Symbol" type that contains a numeric identifier that represents a string (symbols are a built-in feature of Ruby and are typically used like enums, except they are more flexible)
    • A Pair type that contains a pair of values A and B; often you get better performance by not allocating memory for this purpose.
  • Multi-language support. Well, the JVM can certainly support multiple languages, but only .NET is specifically designed for it. Admittedly, the design isn't that great, but at Microsoft specifically considers the needs of other languages.
  • Delegates. The Java equivalent is using interfaces with one function in them, but this is relatively inflexible and certainly more annoying to use. Java provides inner (even anonymous) classes to help people use the pattern, but delegates are way better.
  • Closures (functions inside other functions, where the inner function can access local variables of the outer function). Java doesn't have that, does it? You can access "final" variables from a function-inside-a-class-inside-a-function, but that's all. By the way, .NET itself doesn't actually support closures, but C# fakes it well.
  • Iterators. Now this may be my favorite feature of C# 2.0; it would be hard to choose between iterators and generics. I love them not only because you can create enumerators easily (which is great) but also because you can approximate coroutines with them.
  • Swing. Ugh! It's ugly, it's slow, and the Windows "skin" isn't very convincing. There often seem to be glitches in Swing that you don't find in other programs, such as the failure to resize a window fluidly (i.e. the window doesn't redraw itself until you let go of your mouse button). Finally, and worst of all, developing Swing GUIs is a huge pain in the ass. I absolutely can't stand it. The .NET counterpart, Windows.Forms, doesn't seem all that well designed, but it looks good, it's relatively fast, and it's easy to write code for it. Plus, of course, a good Forms designer is a standard feature of any .NET IDE.
Right now I wish I could have the C# 3.0 "var" feature because I'm sick of typing

SomeJerkGaveThisClassALongName foo = new SomeJerkGaveThisClassALongName();

Obviously we should be able to write simply

var foo = new SomeJerkGaveThisClassALongName();

And there's a lot of other great stuff in C# 3.0 [.doc]:
  • Lambda expressions (syntactic sugar for anonymous inner functions) with type inference
  • Type inference for generic method calls
  • Extension methods (they are not well thought out, but I'd rather have them than not)
  • Object and collection initializers (to make code more brief)
  • Anonymous POD ("plain old data") classes, which work like tuples except that the fields have names.
  • And last but not least, the query thingie, LINQ.
Suddenly, C# is starting to seem a lot more like boo.

Having said all this, there are a couple of things from Java that I might like to have in C#:
  • The assert statement. Typing Debug.Assert() all the time is driving me nuts.
  • Inner classes that have an implicit link to the outer class
And let's see, if I could have some more features I think they would include
  • Traits
  • The ability to supply a default implementation for a member of an interface
  • Preconditions and postconditions on methods
  • Static and run-time unit checking (units as in metres, litres, bytes, pixels, etc.)

The Loyc Blog

This blog will be a place for me to report on the progress of Loyc and to comment on the programming field in general. Welcome.