Category Archives: Software Development

vprintf segfault on Ubuntu Linux 15.10

I have a C program with a ‘print’ function that accepts a format string and variable list of arguments, similar to printf style functions. The function simply prints to stdout and also to a logfile.

void print(const char* format, ...) {
	pthread_mutex_lock(&logfile_mutex);
 
	va_list args;
	va_start(args,format);
 
	vprintf(format,args);
	fflush(stdout);
 
	fprintf(logfile,"%d ",milli_timer());
	vfprintf(logfile,format,args);
	fflush(logfile);
 
	va_end(args);
 
	pthread_mutex_unlock(&logfile_mutex);
}

Everything seemed fine compiling with g++ under Cygwin, but would crash when compiled on Ubuntu Linux 15.10. After a bit of digging I finally spotted the problem in the documentation for vprintf:

Internally, the function retrieves arguments from the list identified by arg as if va_arg was used on it, and thus the state of arg is likely altered by the call.

Apparently this is one of those things that is compiler specific. Since we can’t rely on the state of the arguments list after the first call to vprintf, the solution was simply to use a new arguments list for the second call to vprintf which writes to the log file. Revised function:

void print(const char* format, ...) {
	pthread_mutex_lock(&logfile_mutex);
 
	va_list stdout_args;
	va_start(stdout_args,format);
	vprintf(format,stdout_args);
	fflush(stdout);
	va_end(stdout_args);
 
	va_list log_args;
	va_start(log_args,format);
	fprintf(logfile,"%d ",milli_timer());
	vfprintf(logfile,format,log_args);
	fflush(logfile);
	va_end(log_args);
 
	pthread_mutex_unlock(&logfile_mutex);
}

Hope that helps someone.

local array causes segfault

Recently while working on a C program I was intermittently getting a segfault. The code appeared correct yet it would inexplicably crash, seemingly at random. I compiled the program with the ‘-g’ flag to enable debugging symbols, and discovered I was dealing with the dreaded stack overflow.


Exception: STATUS_STACK_OVERFLOW at rip=00100414BC6

The line of code in question was the declaration of a large local array. Something like:


int foo[REALLY_BIG_NUMBER];

The fix was really simple. I just replaced that line of code with this one:


int* foo = malloc(REALLY_BIG_NUMBER * sizeof(int));
...
// later on...
free(foo); // <--- very important!

That's it. The difference is the area of memory that the data is allocated in. In the former example (the local array), the compiler used stack memory. The stack is an area of memory used for temporary variables created by each function. By allocating a large array, I tried to fit more into the stack than it could accommodate, hence the 'stack overflow'. In the latter example, a different area of memory is used called the heap. The heap is not as tightly managed and is somewhat larger.

Swafford Consulting Hired for TrACER-R Project

I’m very pleased and excited to announce that Swafford Consulting has been selected by IVIR Inc. (Information Visualization and Innovative Research) to rewrite and expand the Training/Test Assessment Capabilities and Reporting for Research (TrACER-R) System.

The following information has been publicly released by IVIR and posted on this website with permission from them.

TrACER is an automated assessment and evaluation system, designed for research and test conduct. It automatically produces test instruments, collects tests data, correlates data and produces both statistical and descriptive analysis for final test reports. The system can be used for any procedure or skill set identified in critical research areas. It produces objective evaluation of subjects, and observer/controller performance for cognitive tasks, psychomotor skills, affective measurement, and decision making performance. It is particularly useful for medical research.

Swafford Consulting has done work for IVIR before, on the F2MC Trainer project .

JBoss 7 – Channel end notification received, closing channel Channel ID (inbound) of Remoting connection to null

As mentioned in a previous post I’ve recently upgraded to JBoss EAP 6.2 / AS 7.1. For a few weeks after the transition a small set of users were complaining about a ‘channel closed’ error. Few errors have driven me crazier than this one. It was not a good time, for me or the users.

After receiving this error the user was unable to do anything in my application; they had to shut it down and restart. In the server logs I was also seeing a steady stream of :


java.lang.IllegalStateException: EJBCLIENT000025: No EJB receiver available for handling

I try pretty hard to minimize the possibility of bugs. But, every once in a while one does pop up, and when it does I try to reproduce it in a development environment, write a failing test, and then solve the problem. Try as I might, I just could not reproduce this error in a development setup. After some time it became clear that the error had something to do with their environment.

Well, it turned out that about the same time I updated the application servers, the IT staff were busy upgrading firewalls. They pulled a Linux based firewall and replaced it with a Cisco ASA (I can’t remember the exact model). These ASAs are pretty clever. They have some logic built into them to detect when a connection is ‘dead’ and then forcibly closes the connection. What happens is, after about 30 minutes of inactivity the firewall decides that the connection between the application client and the server (which is offsite in a data center) must be dead, so it kills it, unbeknownst to the application. Then, the user goes to do something again, and the dreaded ‘Channel Closed’ error would appear.

You might think that is the end of the story. I wish it were.

Once the problem became clear I knew the solution would be to use some sort of ‘keep alive’ on the connection. And, as it turns out there is a way to do that. Just set this property (either in XML or programmatically as I do here):


clientProp.put("remote.connection.default.connect.options.org.jboss.remoting3.RemotingOptions.HEARTBEAT_INTERVAL","60000");

The issue was that I was mixing two different approaches to establishing remote connections, with the consequence being that my ‘heartbeat’ configuration was not taking effect. One method, which appears to be the less superior but better documented approach, is to use remote naming. There is also the EJB Client API, which according to this page is the better approach (though it doesn’t tell how to use it).

Long story short: if you want to use the EJB Client API, then your Context lookup should look something like this:

private static void getContext(String login,String hashedPassword) throws NamingException {
   Properties clientProp = new Properties();
   clientProp.put("endpoint.name", "programmatic-client-endpoint");
   clientProp.put("remote.connectionprovider.create.options.org.xnio.Options.SSL_ENABLED", "false");
   clientProp.put("remote.connections", "default");
   clientProp.put("remote.connection.default.port", "4447");
   String namingProvider = System.getProperty("java.naming.provider.url");
   if (namingProvider==null) namingProvider="localhost";
   clientProp.put("remote.connection.default.host", namingProvider); 
   clientProp.put("remote.connection.default.username", login);
   clientProp.put("remote.connection.default.password", hashedPassword);
   cientProp.put("remote.connection.default.connect.options.org.xnio.Options.SASL_POLICY_NOPLAINTEXT", "false");
   clientProp.put("remote.connection.default.connect.options.org.xnio.Options.SASL_POLICY_NOANONYMOUS", "false");
   clientProp.put("remote.connection.default.connect.options.org.xnio.Options.SASL_DISALLOWED_MECHANISMS", "JBOSS-LOCAL-USER");
   clientProp.put("remote.connection.default.connect.options.org.jboss.remoting3.RemotingOptions.HEARTBEAT_INTERVAL","60000");
   EJBClientConfiguration cc = new PropertiesBasedEJBClientConfiguration(clientProp);
   ContextSelector<EJBClientContext> selector = new ConfigBasedEJBClientContextSelector(cc);
   EJBClientContext.setSelector(selector);		
 
   Properties p = new Properties();
   p.put(Context.URL_PKG_PREFIXES, "org.jboss.ejb.client.naming");
 
   ctx = (Context) new InitialContext(p);
}

In particular please note that the Properties object passed into the InitialContext has just one key/value pair — the one that tells it to use the EJB Client API. All other configuration options should be passed into the PropertiesBasedEJBClientConfiguration. In my case I had some extra/ superfluous “stuff” in the Properties given to the InitialContext, and as a result the heartbeat messages were not going out. I think it had fallen back to using remote naming, but I can’t be sure. Once the extra junk was removed everything started working as expected.

JBoss 7.x remoting + security-domain

I just recently updated an application to Red Hat’s EAP 6.2.0 platform, which uses the JBoss 7.3 Application Server under the hood. I’ve been a JBoss user for a long time now. In fact, this application started life on JBoss 3.something, and I (sometimes with the help of other devs) have seen it upgraded to every major version of JBoss since. The migration from 6 to 7 is hands down the most difficult to perform. Well, not that it’s really all that difficult, so we’ll just say time consuming. It took a good week to get everything right.

I’ll have more to say about the migration path soon, but one specific area that I think warrants special attention is the security subsystem.

The application has several components to it, one being a Java Swing client that uses Remote Method Invocation. The client has its own log in screen and used the UsernamePasswordHandler class and the ClientLoginModule from JBoss Security packages. Something like this:


UsernamePasswordHandler handler = new UsernamePasswordHandler(username, MD5.hash(password).toCharArray());
LoginContext lc = new LoginContext("MYCONTEXT", handler));
lc.login();

That’s all pretty simple really. The UsernamePasswordHandler is, well, a callback handler that handles NameCallbacks and PasswordCallbacks by setting the values to what you passed into the constructor. We just pass that handler into the LoginContext, which is backed by the default ‘ClientLogniModule’, which just sets the security principal and credential to what’s handed to it by the handler. When ‘login’ is invoked the username and (hashed) password are authenticated against whatever login-module is configured on the server side, which in my case is a DatabaseServerLoginModule.

Unfortunately this became a bit more complicated as a result of this migration. For reasons I don’t understand neither the UsernamePasswordHandler callback nor the ClientLoginModule are available in JBoss 7, nor can I find any classes with similar functionality. It’s not very difficult to implement your own, and in fact that’s what I had to do, but the fact that I had to is … annoying!

Another big change in JBoss 7 is that you can’t even get an InitialContext to do any JNDI lookups without authentication. In other words, you can’t simply just do ‘new InitialContext()’ any longer. I’m not talking about executing remote methods on an EJB here, I mean at the transport layer – the remoting connector itself requires authentication. Here is the configuration :


<subsystem xmlns="urn:jboss:domain:remoting:1.1">
<connector name="remoting-connector" socket-binding="remoting" security-realm="ApplicationRealm"/>
</subsystem>

See that bit about the ‘ApplicationRealm’ ? That’s the security realm. Within the security realm you must supply a configuration for user authentication. As far as I can tell you can either authenticate against what is called a ‘local user’ (an account you create with the JBoss ‘add-user.sh or add-user.bat’ scripts, that applies to the entire server), or you hand it all off to a JAAS module. Since I have multiple applications running on the server that all have different sets of users, I opted for the latter. Here is the configuration for the ApplicationRealm security realm:


<security-realm name="ApplicationRealm" >
<authentication >
<jaas name="MySecurityDomain"/>
</authentication>
</security-realm>

Now, this is where it gets really good. As I just said, I have multiple applications that all have a their own sets of users. However, the remoting connector is bound to a single security realm, and the security realm to a single JAAS module. I do not believe it’s an option to create multiple remoting connectors, each on its own port, which means that a single JAAS module needs to handle authentication for all applications that run on the server. I’m really hoping I’m just missing something here, but if I am then it’s due to inadequate documentation.

Anyway, I had one last hurdle to jump through. Since a single JAAS module was going to have to authenticate all users, I needed a way to deal with the possibility that there might be different users with the same name in different applications, or , the same user might exist in different applications, have the same password in each, but have different roles. In other words, I need to be sure we’re authenticating against the correct database for the application the user is logging into! So, it’s not enough to just pass over a user name and password any longer — we need the application (or context, or domain if you like) in addition. And then we need to use that context to query against the proper database.

To get the context I just appended it to the user name. So, instead of ‘james.swafford’ the principal is now ‘james.swafford@somedomain’. Easy enough. Now, how to do the authentication itself? I can think of two ways to do this.

I’ve always used the ‘out of the box’ Database Login Module on the server side. If we wanted to use that and had just one application to worry about, the configuration would look something like this:


<security-domain name="MySecurityDomain">
<authentication>
<login-module code="Database" flag="required">
<module-option name="dsJndiName" value="java:jboss/SomeDS"/>
<module-option name="principalsQuery" value="select md5passwd from users where login = ? and active=TRUE"/>
<module-option name="rolesQuery" value="select securityroles.role,'Roles' from securityroles inner join users_roles on securityroles.roleid=users_roles.role_id inner join users on users_roles.user_id=users.user_id where login =?"/>
<module-option name="unauthenticatedIdentity" value="guest"/>
<module-option name="password-stacking" value="useFirstPass"/>
</login-module>
</authentication>
</security-domain>

Since I have multiple applications, I had to do something a little different. One trick would be to chain multiple login modules together, making each of them ‘sufficient’ instead of required. To do that you’d just have to change the queries to take the context into account.


<authentication>
<login-module code="Database" flag="sufficient">
<module-option name="dsJndiName" value="java:jboss/App1DS"/>
<module-option name="principalsQuery" value="select md5passwd from users where login || '@app1' = ? and active=TRUE"/>
<module-option name="rolesQuery" value="select securityroles.role,'Roles' from securityroles inner join users_roles on securityroles.roleid=users_roles.role_id inner join users on users_roles.user_id=users.user_id where login || '@app1' =?"/>
<module-option name="unauthenticatedIdentity" value="guest"/>
<module-option name="password-stacking" value="useFirstPass"/>
</login-module>
<login-module name="Database-2" code="Database" flag="sufficient">
<module-option name="dsJndiName" value="java:jboss/App2DS"/>
<module-option name="principalsQuery" value="select md5passwd from users where login || '@app2' = ? and active=TRUE"/>
<module-option name="rolesQuery" value="select securityroles.role,'Roles' from securityroles inner join users_roles on securityroles.roleid=users_roles.role_id inner join users on users_roles.user_id=users.user_id where login || '@app2' =?"/>
<module-option name="unauthenticatedIdentity" value="guest"/>
<module-option name="password-stacking" value="useFirstPass"/>
</login-module>
</authentication>

That works OK and is easy to do, but the drawback is that the app server is going to cycle through all of those modules until one successfully authenticates or they all fail. That is potentially a lot extra querying. Another, probably better solution would be to create a CustomLoginModule that is clever enough to use the context to determine which data source to query.

None of this was all that difficult but it is different and took a little time to learn. Hopefully this will help someone out there save a little time.

JBoss + HornetMQ + Encrypted FS + AIO

I wrestled with an …. interesting issue today that sunk most of my morning. I was working on an EJB3 application with a local installation of JBoss 6.1. In fact, I worked on the same application last Friday. Today, however, when I started the server I was greeted with this:

16:23:28,800 ERROR [org.hornetq.ra.inflow.HornetQActivation] Unable to reconnect org.hornetq.ra.inflow.HornetQActivationSpec(ra=org.hornetq.ra.HornetQResourceAdapter@4076fc9 destination= destinationType=javax.jms.Queue ack=Auto-acknowledge durable=false clientID=null user=null maxSession=15): HornetQException[errorCode=2 message=Cannot connect to server(s). Tried with all available servers.]
at org.hornetq.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:619) [:6.1.0.Final]
at org.hornetq.ra.inflow.HornetQActivation.setupSession(HornetQActivation.java:352) [:6.1.0.Final]
at org.hornetq.ra.inflow.HornetQActivation.setup(HornetQActivation.java:290) [:6.1.0.Final]

So, I did what any software developer would and ran to Google. Google, unfortunately, didn’t give me much. I wrestled with it for longer than I should have, feeling stumped, when finally I found towards the top of the log :

10:12:20,837 INFO [AIOFileLockNodeManager] Waiting to obtain live lock
10:12:20,837 INFO [AIOFileLockNodeManager] Live Server Obtained live lock
10:12:21,159 SEVERE [HornetQServerImpl] Failure in initialisation: HornetQException[errorCode=205 message=Can’t open file]
at org.hornetq.core.asyncio.impl.AsynchronousFileImpl.init(Native Method) [:6.1.0.Final]

This was me: “Uhm, what? Is this a permissions issue? But I haven’t touched this configuration and it was just working!!! #$)**!@!”

With that additional information I turned back to Google, and this time found the issue. On Linux systems (I am running Linux Mint) with libaio installed, JBoss will attempt to use AIO for direct and asynchronous file operations. Basically, it’s a performance optimization that bypasses the file system cache in favor of very low level file operations. HOWEVER, if you are on an ENCRYPTED file system (which I am), then this breaks.

That was great, but if it worked Friday, and I haven’t updated the config, then what gives? Well, I did install MySQL over the weekend – and it turns out that MySQL on Ubuntu based systems does install libaio. So, in short, though the configuration hasn’t changed, the environment did.

Finally, I discovered a solution – just configure JBoss to not use AIO. I haven’t used it up until now anyway, but now I had to explictly tell it not to. This can be done in /deploy/hornetq/hornetq-configuration.xml with one line:

< journal-type > NIO < /journal-type >

Hopefully that saves someone out there a little trouble. Probably me in 6 or 12 months from now. 🙂

Functional Programming Principles in Scala

I recently finished a Coursera class entitled ‘Functional Programming Principles in Scala’, taught by none other than Martin Odersky himself. For those that don’t know, Martin is the creator of the Scala language and a professor at École Polytechnique FĂ©dĂ©rale de Lausanne.

Read it and weep: scala_certificate

For those contemplating taking this class, I encourage you to do so. It was well worth it! My only “caution” would be to take note of the title — the class is truly about FUNCTIONAL PROGRAMMING using Scala. Meaning, you will focus on solving problems in a purely functional way (in the mathematical sense). E.g. no mutable variables, functions as first class citizens, currying, etc.

I think the code snippet below is one of the more beautiful pieces of code I’ve ever seen, and it’s representative of the types of things you’ll learn. It defines the (infinite) series of prime numbers in a ‘lazy’ way.


def from(n: Int): Stream[Int] = n #:: from(n+1)
def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.tail filter (_ % s.head != 0))
val primes = sieves(from(2))

The entire class is 7 weeks. Each week you’ll watch approximately 90 minutes of video lectures, and you’ll have some assignments that get progressively more difficult as the course progresses. At first the assignments are very easy, but by week 6 they get fairly difficult. Expect to average around 5-7 hours per week on assignments. You’ll upload the assignments using a provided tool, and within a few minutes you’ll get your grade. You can submit the assignments as many times as you like to improve your grade, up to the point the assignment is due.

Here is the week by week breakdown of topics:

  1. Functions and Evaluations
  2. A lot of basic syntax in here. You’ll learn about the concept of functions as first class citizens, evaluation strategies and tail recursion.

  3. Higher Order Functions
  4. Composing functions using other functions, currying

  5. Data and Abstraction
  6. How to compose more complex data structures, polymorphism

  7. Types and Pattern Matching
  8. Functions as objects (everything is an object!), subtyping, generics, pattern matching

  9. Lists
  10. Pairs, tuples, higher order functions on lists (such as map and filter),

  11. Collections
  12. Collections other than lists, such as maps and sets. for comprehensions (not your standard for loop!)

  13. Lazy Evaluation
  14. Trees, streams, lazy evaluation.

I can’t say enough good things about this class. The material was interesting, the assignments were challenging, and I thoroughly enjoyed it.

JBoss 6.1.0.Final + HornetQ Timing Issue

I had one of those ‘one line fix’ problems on Monday that took most of my day to figure out. It was one of those problems that makes perfect sense in retrospect, but when I was in the trench the solution was anything but obvious.

A big chunk of my time these days goes towards maintaining and extending what would now be considered a ‘legacy’ application. It’s an EJB3 based application that runs on a JBoss application server. The latest version of JBoss AS is 7.1, but we were still running on 5.1.0.GA, so I decided it was time to upgrade. For those that know anything about JBoss – 7.1 is a different beast altogether. It’s a different architecture than all previous versions. After spending about a day digging in, I decided to punt on upgrading to 7.1 and settle for 6.1.0 as an interim step.

For the most part, the upgrade from JBoss AS 5.1 to 6.1 is pretty straightforward. Lots of applications are directly portable — just drop them in and go. The biggest ‘gotchya’ would be porting any message driven beans. JBoss 5.1.0 used JBoss Messaging as the underlying JMS provider, but 6.1.0 uses hornetq. Still, how hard could it be?

I was fooled. I don’t have too much in the way of message driven beans in this application. For the most part they just send emails. I have some integration tests that queue up messages, which should result in a few emails landing in my inbox. After making a few minor tweaks, everything seemed to work. Done! So, I tagged the code and pushed a release into production. Then my problems began. No emails.

The first thing that popped in my head was that ‘mail-service.xml’ on the production JBoss server must be misconfigured — the emails were probably being rejected by the mail server. That wasn’t it. Then I dug around and tried to find any discrepancy at all in the configuration of my development server and the production server. Did I miss a step? I really went out of my way to minimize the amount of configuration needed at all — I want my application to run on a plain vanilla setup. It turns out that, no, I didn’t miss a step. So what gives?

After scratching my head for an hour or so I realized that the problem was interim. Sometimes it worked, sometimes it didn’t. Then, after a little while longer I realized that the problem only seemed to affect a few of the eight applications deployed to that server. Huh? I continued to experiment, and eventually realized the problem had nothing to do with the application at all. If I deployed any one of those eight applications, it would work every time. But with all eight deployed, a few were affected. My thoughts turned to resource contention — perhaps there first few applications were consuming some resource that was in “limited supply,” thereby “starving out” the last few applications? But what? I combed through log files. Finally, I found these lines repeated several times throughout the log:


10:29:28,754 INFO [HornetQActivation] Attempting to reconnect org.hornetq.ra.inflow.HornetQActivationSpec(ra=org.hornetq.ra.HornetQResourceAdapter@64c826 destination=queue/EmailQueue/MEDPLUS destinationType=javax.jms.Queue ack=Auto-acknowledge durable=false clientID=null user=null maxSession=15)
10:29:28,757 INFO [HornetQActivation] awaiting topic/queue creation queue/EmailQueue/MEDPLUS

… and sometime later:


10:29:29,095 INFO [HornetQServerImpl] trying to deploy queue jms.queue.EmailQueue

I also poked around the JMX Console a bit and discovered that the ‘ConsumerCount’ was set to 0 for the applications that weren’t working correctly. The message driven beans were not consuming messages from the JMS queue at all!

It finally dawned on me: this wasn’t a problem with the code, or a configuration issue, or resource contention — this was a timing issue. For applications that deployed BEFORE HornetQ deployed the message queues, the message driven beans would never even receive a message. The applications that were the last to deploy (those that came AFTER HornetQ deployed the message queues) were the ones that were working. So then the question became – since the message driven beans are dependent on these queues, how do I delay their creation until after the queues are deployed? It came down to putting this annotation at the top of the MDB:

@org.jboss.ejb3.annotation.Depends("org.hornetq:module=JMS,name=\"EmailQueue\",type=Queue")

Suddenly, everything just worked.

This seems to be a common issue, so I wonder why it isn’t mentioned in the JBoss migration guide. I also have to wonder why it’s even necessary in the first place. Of course a message driven bean is dependent on the queue it receives messages from, so why force the developer to take this step?

Finally, I have to ask myself how my testing set up could have been better so that this problem didn’t make it into production. But honestly, I don’t have an answer for that. I wish I did.

Groovy’s MetaClass

I’m not a big fan of dynamic languages, but I recently came across a situation where Groovy’s MetaClass came in extremely handy. The situation was this:

I am maintaining a Grails application that interfaces with a Java based codebase. One of the methods in this Java codebase returns a list of Data Transfer Objects (DTOs). This list is ultimately passed on to the view layer for rendering. I really wanted to add a new field to the DTO in order to avoid doing what I considered “business logic” in the view. I came up with several options:

  1. Modify the DTO.
  2. As I have control over the Java codebase, this was an option. However, the field I wanted to add really isn’t relevant in that codebase, so this really wasn’t an attractive option.

  3. Extend the DTO.
  4. Use simple object inheritance to create a new class that inherits all the public members of its parent, as well as adding the new member. This works if you don’t mind the coupling and the parent class is not finalized, and this DTO’s class is not.

  5. Wrap the DTO.
  6. This is basically the ‘decorator pattern’, and is generally favored over inheritance.

  7. Create a new DTO.
  8. Forget inheritance or composition — just create an entirely new Object that contains all the same members, as well as the new field.

  9. Groovy’s MetaClass!
  10. The idea here is we “dynamically” extend the DTO. It’s as if the class were “extended” solely for this specific use, but not in general.

The MetaClass solution was really easy to implement. After fetching the list of DTOs, we just iterate over the list to add our new field into the MetaClass:

def dtos = myBean.getDTOs()
dtos.each {
   it.metaClass.newField = < some logic here >
}

That’s it! The “newField” member now can be accessed by the invoking method in the regular way.

+1 for Groovy.

A First Look at Scala

Scala has been on my “personal radar” for two or three years now, but it hadn’t quite bubbled to the top of the list until recently. I’ve been playing around with it off and on for a couple of months now; not enough to get too deep into it, but I’ve gotten a taste. My first impression – wow.

Scala is a JVM based language, meaning it gets compiled to Java bytecode and its instructions are executed on the Java Virtual Machine. The future of Java as a language is somewhat questionable at the moment (though I wouldn’t write it off by a long shot), but as a platform, there is every indication that Java will be around for a long, long time. Indeed, it seems that there is a new JVM based language every month. Groovy is probably the most popular of the list, but there’s also Clojure, JRuby, Jython, and Ceylon among others. Each of these languages benefits from the tremendous amount of research and development put into the JVM. Oh, and BTW, Scala is also available for the .NET platform.

Back in my university days I took a functional programming class in which we studied Haskell. I have fond memories of learning about lambdas and curried functions and the notion of functions as first class citizens and tail recursion and immutable state and lazy vs eager evaluation and mapping, filtering, and folding and all the other things that makes functional programming so wickedly cool. My professor, Dr. Karl Abrahamson, would talk about “focusing on the what, not the how.” My “a-ha” moment was when Dr. Abrahamson implemented a quick-sort in Haskell on the board in just a few lines of code. It was so elegant, so succinct. We went on to do some really neat things, including a guided theorem prover.

The problem with all this is that Haskell is a purely functional language, which means that its application is somewhat limited. Oh sure, you could point to many academic and even a few industry projects that use purely functional languages, but it will never be mainstream. You’ll never see Haskell in the everyday business application. Dr. Abrahamson’s take on this was that, even if you couldn’t use a functional language directly, you could take the lessons you’ve learned and apply them to your “every day programming.” Thinking in a functional way would make you a better programmer. And, while I certainly agree with that statement, it left me somewhat unsatisfied.

What intrigues me about Scala is that it’s a mixed paradigm language, and I think that’s key to gaining mind share. What I mean by that is Scala supports both functional and object oriented paradigms. This means that you can use functional programming where it’s appropriate but sometimes you need to be able to change state, and Scala accommodates this.

With that, here are some notes I’ve taken as I’ve progressed. Again, I am far from Scala mastery, but here are some of the basics.

Scala is more difficult to master than most languages. Maybe I should say “than most mainstream languages.” Groovy would be a much easier transition for most Java programmers, because pretty much anything that compiles in Java will already compile in Groovy. That’s not true of Scala. That said, the syntax is elegant and succinct. But, it does take practice.

Scala is well suited for highly concurrent applications. That’s what Scala means after all – ‘SCAlable LAnguage’. Our typical approach to solving concurrency issues is to synchronize access to shared, mutable state. But Functional Programming is all about using immutable values and functions that return well defined outputs (in the mathematical sense) with zero side effects. If state can not possibly change, there is no need to synchronize access. Reasoning about synchronization is not something that is natural to anyone. It is difficult to do correctly and therefore error prone. Scala reduces the need to do so by encouraging programmers to prefer immutability where possible.

Scala is statically typed but makes extensive use of type inference. I don’t want to get into a long discussion of static vs. dynamic typing here, but personally, I prefer static typing. And that is my main complaint with Groovy – it’s dynamically typed. Static typing means that an entire class of errors are eliminated from my program because the compiler checks for them at compile time (not at runtime). Type inference means that the compiler can often infer the type from the context, which allows the code to be much less verbose than what you might see in Java. Consider this example:

val myMap: Map[Integer, String] = new HashMap

Compare that to Java:

Map<Integer,String> myMap = new HashMap<Integer,String>();

And, in most cases, the return type for methods can be inferred too. Consider this contrived example:

def addTwo(x: Int) = {
   x + 2
}

Should I really have to specify the return type here?

So, Scala enjoys all the benefits of static typing (chief among them being compile time checking and improved performance/optimization), and at the same time the code is usually as succinct as what you would see in a dynamic language such as Groovy or Ruby.

Everything is an object. There is no notion of a primitive in Scala. Compare this to Java, where you have primitives such as ‘int’, ‘float’, and ‘boolean’. There are no statics in Scala (use singletons), because static members are defined on a class, not an instance. But, more to the point, even functions are objects and therefore can be passed as method arguments. That is an incredibly powerful construct. Take a look at the filter method of Scala’s List class:

def filter(p: A => Boolean): List[A] = this match {
    case Nil => this
    case x :: xs => if (p(x)) x :: xs.filter(p) else xs.filter(p)
}

The argument to ‘filter’ is a function! The function, which we’ll call ‘p’, accepts an arbitrary type ‘A’ and returns a Boolean. The output of the filter function itself is a list of A’s.

Scala has sophisticated pattern matching. You might have noticed in the example above that the ‘case’ statement didn’t look like what you might see in C++ or Java, where case statements are limited to matching against ordinal types. The best you can do in those languages is something along the lines of “if x is 2 then do this, if x is 4 then do that.” With Scala, we can pattern match using sequences, types, and even regular expressions. We can use wildcards as well, and even do deep inspections of an object’s variables.

Here’s what it might look like to match on a sequence:

for (l <- List(A,B,C))) {
   l match {
      case List(_,7,_) => println("There are three elements, and the middle one is 7.")
      case List(2)     => println("A singleton list with element 2.")
      case List(_*)    => println("Any other list.")
   }
}

Say goodbye to NPE’s. Java programmers know that acronym — the dreaded NullPointerException. The typical scenario is that you invoke some method, expecting to get an Object back. Once you receive that Object you try to invoke one of the methods defined on it, only to find out your ‘Object’ isn’t really an Object afterall — it’s ‘null’. To get around this Java programmers end up putting in a lot of defensive code that just clutters things up.

The Scala solution is to encourage the use of the ‘Option’ class (subclassed by ‘Some’ and ‘None’) whenever there is a possibility that the return may not refer to a value. Take a look at the example below that creates a Map and then retrieves objects from it.

val bookMap = Map(
   "Moby Dick" -> "Melville",
   "Great Expectations" -> "Dickens",
   "The Art of War" -> "Sunzi")
 
println("Moby Dick: " + bookMap.get("Moby Dick").getOrElse("unknown"))
println("The Time Machine: " + bookMap.get("The Time Machine").getOrElse("unknown"))

The output of this program would be ‘Melville’, followed by ‘unknown’.

No more passing in pointers or references of objects to be modified, or defining silly return structures. There are times you want to return more than one object. To get around this you typically see something like passing in a reference to an object as an argument, even though the argument isn’t an input (it’s an output). Or, going through the overhead of creating a composite data structure who’s only purpose in life is to wrap other objects, so they can all be returned as a single object.

Scala neatly solves this with Tuples. I found a nice example on Stack Overflow that illustrates the concept nicely.

// Get the min and max of two integers
def minmax(a: Int, b: Int): (Int, Int) = if (a < b) (a, b) else (b, a)
 
// Call it and assign the result to two variables like this:
val (x, y) = minmax(10, 3)     // x = 3, y = 10

Here we see a function ‘minmax’ that takes as arguments two integers and returns a tuple (of two integers). Pretty cool!

Interfaces can have (optional) implementations. Well, that’s not quite true, but that’s the idea behind traits. In the Java world, classes can have only one parent (single inheritance). A class can implement any number of interfaces, but interfaces don’t have implementations. But, sometimes we do need a class to support multiple abstractions, and some of those abstractions may have boilerplate code that can be implemented in a high level class. That’s very difficult to accomplish in a nice way with Java. C++ supports the notion of multiple inheritance, but that has its own problems (see the diamond problem).

Scala solves this cleverly with Traits. Traits give us the ability to push that boilerplate (reusable) code up. You can think of a Trait as a partial implementation. Look for a separate post on this topic soon.

That just scratches the surface of what Scala is about, but the more I learn, the more I like it (give me the red pill please). I will likely spend several more months delving deeper into the world of Scala, and as I do I’ll write up some of the things I learn.