How’s that Post-A-Week Thing Workin’ Out For Ya?

Well, we are in week 7 of the post-a-week challenge and so far I’ve posted 4 times (including this one). So, so far, not so good. Super jammed up at work, so I have lots of ideas for interesting (or at least difficult) problems I’ve solved, but no time to write them down. There will be one tomorrow, I promise. And hopefully some more makeup posts till I’m caught up to an average of a post per week, if not a post every single week.

Advertisements

Take the Time to Go Fast

This week, I have been working my tail off and making what feels like slow progress, but I have been keeping in mind what Uncle Bob calls The Primal Conundrum.

Programmers face a conundrum of basic values. All developers with more than a few years experience know that previous messes slow them down. And yet all developers feel the pressure to make messes in order to meet deadlines. In short, they don’t take the time to go fast!

True professionals know that the second part of the conundrum is wrong. You will not make the deadline by making the mess. Indeed, the mess will slow you down instantly, and will force you to miss the deadline. The only way to make the deadline—the only way to go fast—is to keep the code as clean as possible at all times.

What I have been working on is relatively complex. It is taking way more time than I estimated (I’m horrible at estimating), yet I’m still taking the time to write my tests (and keep the old tests passing). But are my tests slowing me down?

The answer is not always no, but in this case, I would say it is an emphatic no! As I said, I am writing some relatively complex stuff – security related stuff. I would I want to write it and just run a couple of manual functional tests and assume I have it covered? That sounds like a bad idea. What if I ran a bunch of manual functional tests, not just a few? That might be sufficient, but now we are not saving any time over the automated unit tests and I don’t get to save the tests for all the ways the code will develop in the future. Speaking of which, all of those tests I’ve already written are really saving me a bunch of time. All of those tests that stop passing when I add security means that I have to do something. Those tests aren’t getting in my way. They are reminding me, “Don’t forget to take me into account, too!”

Furthermore, Mr. Martin is not really talking about tests per se, but clean code. If you’re writing something complex, what better way to force yourself to break it into smaller, less complex chunks than by writing the tests first?

Anyway, just wanted to dash this off before I get back to the salt mine. I’m still under pressure and I have more tests to write.

CONCAT Gotcha between MySQL and HSQLDB

Hibernate abstracts most of the data access layer so you can switch out database platforms without changing any code (unless you count configuration as code). This allows me to use HSQLDB for my tests and MySQL for production. I love HSQLDB for integration tests. Running embedded in memory it’s super fast and you don’t have to configure a server or worry about the existing state of the database when you start your test suite (it’s empty!).

There are, of course, perils to testing with it when your production database is something else (and it almost certainly is). I have a Person class with a formula name field:

@Entity
public class Person extends Party {
  private String lastName = "";
  private String firstName = "";
  @Formula("ltrim(concat(concat(first_name, ' '), last_name))")
  private String name;

  @Override
  public String getName() {
    this.name = (firstName + ' ' + lastName).trim();
    return name;
  }
...
}

It took me a while to arrive at that formula, though.  The first one I tried was concat(first_name, ' ', last_name) but HSQLDB complained that it didn’t know the concat formula.  I knew it did, so I found that strange, but tried hunting around for alternatives.  I then tried first_name || ' ' || last_name.  That worked and all of my integration tests were passing so I thought everything was great.  I deployed to MySQL and didn’t get any errors, so everything’s working like a charm, right?

Nope.  But you knew that – why would I bother to write about that? Anyway, MySQL doesn’t complain about syntax because it is treating the || as a logical ‘OR’ and is somehow able to come up with 'Jason' OR ' ' OR 'Erickson' = 0.  Why 0? Well, if you have a string where MySQL expects a number, it will try to convert the string to a number.  A string that cannot be parsed to a number is not an error.  It’s 0.  So 0 || 0 || 0 = 0.

Anyway, now I don’t have any errors, but everybody’s name now evaluates to 0.  Except that it doesn’t.  See, that getter for name? That was there because I wanted it to work whether the Person had already been persisted or not.  That means that it looked like it was working in all of the code that accessed the bean except… find by name.  The query,  “FROM Party WHERE name = ?” was never finding my people by name.  This was quite a puzzle for me.

Well, anyway, I finally figured out what the problem was, and a little “aha” flash told me that the HSQLDB version of concat did not support an arbitrary number of parameters, so concat(first_name, ' ', last_name) wouldn’t work, but concat(concat(first_name, ' '), last_name) would.  Then I added an ltrim to take care of the case with no first name and I was set.

I’m Posting Every Week in 2011

I’ve decided I want to blog more. Looking at my site’s stats, they are quite modest, but I had assumed when I started writing this thing that I would have close to zero hits per day. But it looks like it’s much more than zero (which isn’t saying much, but still… it’s all about expectations, right?). With that in mind, I think this blog needs a little more attention so there’s something for people to read besides that I thought that Spring Roo Sucks (which is too harsh). Therefore, my goal is to have a new post on this blog at least once per week for all of 2011.

I know it won’t be easy, but it might be fun, inspiring, awesome and wonderful. Therefore I’m promising to make use of The DailyPost, and the community of other bloggers with similiar goals, to help me along the way, including asking for help when I need it and encouraging others when I can.

If you already read my blog, I hope you’ll encourage me with comments and likes, and good will along the way.

Re: What are you rewarding

DocOnDev doesn’t like bonuses.   Here’s the lede:

It is nearing the end of the year and we at LeanDog are wrapping up our fiscal year. We’re looking at the potential tax benefits of spending some of our reserve and we’re mulling over other ideas related to the spend of money. We are not, however, discussing our bonus objectives. We aren’t discussing them because we don’t have them. I, for one, am happy that we don’t.

He points out the problem with bonus programs he’s seen, spills some ink (properly) criticizing the metrics that were used and how those incentivized dysfunctional behavior.  He then closes with:

My advice to managers, directors, and executives who want to pay bonuses? Don’t. If you must do so, then break it down based on each employee’s income. If you have a top performer, don’t give them a bigger bonus, give them a raise. Think of a raise as a bonus that lasts forever.

This seems absurd to me.  His title got it right, but the conclusion misses the mark.  It’s not the form of the reward, it’s what you are rewarding.  The problem isn’t with the bonuses (although those can exist, too), it’s with the metrics.  If you gave raises based on the same metrics he described, you would get the same dysfunctions.  You might get the same dysfunctions without bonuses or raises, simply by posting the metrics publicly.  People do what you incentivize them to do.  You are left with a choice between not measuring anything (with respect to incentive) or being very careful about what you are going to incentivize.

Now, I’m no expert on incentives, either, but here are some things I’ve observed.

Value, not behavior

You should incentivize the value you hope to achieve, not the behavior that you theorize will get you there. Measuring behavior can have great value, but should not be used to reward or punish (within the scope of normal expected behavior). I’m not talking about not punishing somebody for stealing. I’m talking about things like the classic KLOC’s. This is a classic bogeyman metric because everyone knows that more KLOC’s may mean more productivity at first, but it’s very easy to inflate your lines of code to the detriment of readability and quality. You may want to measure KLOC’s along with other things (you may want to see it going down in a particularly spaghettified module), but you need to be careful not to incentivize it.

Sphere of Influence

I’m a developer, so I’ll keep with the developer examples. Say Joe Developer is on team of 6 developers in a division with 12 teams in a company with 800 people. Tying some of Joe’s compensation to company profitability is certainly fair, but it doesn’t do much to incentivize him because it is very difficult for Joe to see how his individual contribution affected the bottom line. However, if you have some metric that measured the throughput or quality of the output of Joe’s team of 6 developers, now Joe can see how he affects things (to the benefit or detriment of the team). I don’t want to minimize the challenge of finding reliable measures at this level. (I can’t think of any that I really like for incentives.) But the point is that if Joe can’t see how he can affect it, Joe won’t do anything differently.

Avoid Individual Performance

I also believe firmly that any incentive should be tied to team metrics rather than individual metrics. You want this team working together, not competing with each other for their share of the pie.

Difficult to Game

A classic problematic metric is bug fix rate by the same individuals that can introduce bugs in the first place. If you tried to fix that by subtracting bugs for which the individual is responsible for introducing, get ready to spend a lot of energy trying to track down the ‘real’ responsible person and arguing over whose fault something is. And when a new bug is found, was it just introduced or just discovered? A better example might be if you had a mature legacy application in maintenance mode and a team that was responsible for that maintenance. Bug fix rate might make sense there, but again, you would want to incentivize the team, not the individuals, lest you get a bunch of people that don’t have time to help each other because they need to make their own numbers.

The incentives themselves

So far I have talked about metrics and started off poo-pooing criticism of bonuses. But that doesn’t mean incentives can’t be poorly designed. The problem, in my view is that any metrics can be problematic, given enough incentive. People are ingenious animals and will find very creative ways to game the system, given the ‘right’ incentive.

When people talk about deterrence of behavior, you will often see the phrase “swift, sure and severe”. I prefer “swift, sure and substantial” because that allows for carrots as well as sticks, to misuse a metaphor. What they mean is that the most effective way to change behavior is if the consequence of that behavior is swift – happens immediately – sure – always happens – and substantial – you will really care that it happened. Imagine if every time you went over the speed limit, your car sent a message to the highway department and you were issued a ticket that printed out of your dashboard. Suddenly, the speed limit would be an actual limit. You would almost never exceed it and almost surely drive below the speed limit to avoid accidentally exceeding it. (I’m not advocating such a police state, just pointing out that such a police state would be effective at controlling your behavior.)

Another case in point is smoking. The consequences of smoking can be severe (premature death) but they are neither swift nor sure, so those consequences aren’t enough to make most people quit. On the other hand, for people that enjoy smoking, the consequences are much less substantial (the enjoyment of the act of smoking) but still enough, combined with swift (immediate gratification) and sure (works every time). That’s good enough to sell a lot of cigarettes for a very long time.

So what does this mean for bonuses and other incentives? First, if you want to change behavior, keep those things in mind. If want to cause people to really get creative gaming the system, keep those things in mind. If you pick metrics that are very easy for someone to see for themselves how they are doing, they can count on a bonus (or lack of one) based on those metrics) and you make it very high stakes (for example 50% of base salary) you can bet people will find a way to make their numbers, both in ways you appreciate and probably ways that you do not.

If you don’t trust your metrics completely, dial back the consequences. Poorly chosen metrics can cause dysfunctional behavior even if the consequences are just that the metrics are displayed on a public screen. However, if the metrics are pretty good, but not perfect, put them up on a screen so people can see how they’re doing. If you really like the metric, consider tying a small bonus (or even a variably substantial one) at the end of the quarter or year based on it.

But if you’re going to give people a 50% bump (that they’re used to getting every year) you better love that metric to death, because people will maximize it one way or another.

Motivation Over Talent and Process

Jurgen Appelo has a new post over at Agile Zone that got me thinking.  The thesis is basically that motivation for increasing one’s competence is more important than raw talent any process.

It reminds me of a book I read recently: Talent Is Overrated: What Really Separates World-Class Performers from Everybody Else. This book talks a lot about “deliberate practice” (which is what I would call your studying up on how to be a good presenter). Two things about deliberate practice:

  1. It’s not just experience doing the thing. If your talking sports, it’s doing drills, lifting weights, studying game film, etc. If it’s doing presentations, it’s reading books and blogs, practicing in front of your wife or a mirror, etc.
  2. It’s work. The author thinks it’s not fun and that’s part of it, but I think it’s the motivation you are talking about that actually brings a certain amount of pleasure (if not fun) in doing that hard work.

Where does that motivation come from? How do you cultivate it in others? Or do you just have to find it in your recruiting process and build a team out of intrinsically motivated people?

I’m not sure, but I’m inclined to think some processes (or at least environments) can stifle the motivation to improve and others can nurture it.  Processes that have tight feedback loops are more motivating to improve than processes delay feedback.  Processes that are rigidly proscriptive demotivate people to change their own behavior except possibly to comply.  Financial incentives tied to individual performance can improve individuals (sometimes at the expense of the team).  Financial incentives tied to team performance can improve individuals and the team.  Incentives tied to the performance of the whole company can do little to motivate because they are seen as outside the individual’s sphere of influence.

The Web on OSGi

Just watched this very interesting presentation: The Web on OSGi: Here’s How. It’s something I’ve been thinking about a lot working on my new application – how to build a platform rather than an application. This was both informative and a bit scary (sounds like OSGi is full of pitfalls still).

Another Look at Roo

OK, so I attended the Roo session today at JavaOne.  I got frustrated with it before and I’m more cynical than I started out, but that Ben Alex is just so darned earnest you can’t help but root for the guy.  Not to mention that it’s worth rooting for the tool to be successful just to make my life easier.

I was pretty harsh on Roo before.  I conceded that it still had promise, but I was very disappointed in the experience past the initial euphoria that comes with such a rapid development.  The great thing about the tool (and others like it) is that it basically builds an application for you with very little effort.  The problem can be right after that.  So far in my experience, it happens without fail.

The application that it builds for you is not quite the application you want – just most of it.  That’s great, right? If it builds 80% of the application you only need to build the other 20%.  Unfortunately, it doesn’t quite work that way.  The 80% that’s built is full of stuff you don’t understand (because it was auto-generated). When I say hard to understand, I don’t just mean that the code is hard to read (it often is) or that it uses a bunch of techniques that are unfamiliar to mortal developers (that’s usually true, too).  It’s also that it is unclear how to extend it.

So Roo generates all this persistence stuff that I like, and I like the controller scaffolding, but not the jspx files.  Can I delete them? Will the tool complain? If the tool doesn’t complain, will something else complain when try to start the server and they aren’t there? What files are safe to edit, what files should not be touched and what files can be edited with care? What about all those AspectJ files? It doesn’t look like the autogenerated test does anything, but I can see that it’s running 9 tests. What is it doing, exactly? I dunno – can’t see the code.

If you can’t figure that stuff out, you’re stuck with 80% of an application that can’t be finished so you’re pretty close to 0% of the way to a finished application.

I left with more hope than when I entered.  I got some clarification on some of those questions in my session today.  It is also perfectly valid to use Roo to build 80% of your application and then remove Roo.  Now you can edit whatever you like – it’s just Java.  You can’t remove Roo automatically, but you can do it manually without too much effort.  Now I’ve got a huge leg up on building my application.

It’s too late for my current project, but my next project will get going soon and I’m going to give Roo another day in court (the GWT stuff still might not be ready for prime time).  It’s cheap to try.

Groovy: To Infinity and Beyond

Attended a JavaOne session on Groovy today.  I was looking for an intro to Groovy but it was really more aimed at experienced Groovy developers – what’s new with 1.7, what’s coming in 1.8, etc.

Groovy is pretty cool, but it also has some of the same problems as all dynamic languages have.  It looks to me as if the whole point of Groovy is brevity.  This has great potential benefits to both efficiency and clarity.  However, the benefit to clarity is based on eliminating boilerplate stuff from Java that doesn’t actually add any clarity – removing clutter.  There are a lot of things in Groovy (it seems to me) that are really an enemy of clarity.

e.g.

def divide = {a, b -> a/b }
def halver = divide.rcurry(2)
assert halver(8) == 4

Clearly.


def plus2 = { it + 2 }
def times3 = { it * 3 }
def composed1 = plus2 << times3
assert composed1(3) == 11
assert composed1(4) = plus2(time3(4))

The result of these asserts should be intuitively obvious to the most casual observer.

Java in the Cloud

The first breakout session I attended at the JavaOne conference (or Unconference for this part) was a discussion on cloud computing.  Now this was just an interest group, not a speaker-led presentation.  The idea is that in the group of interested people, some would have experience in the topic and could learn from others with experience while those with little or no experience could benefit from the discussion by listening in or asking questions of the “experts”.

Unfortunately, there wasn’t much expertise on hand.  It seems that no one is doing much Java in the cloud – at least not among the 30 or so people dedicated enough to show up on Sunday for the discussion.  So was it fruitful anyway? Yes, and here’s why.

First was a discussion of what it even means.  We didn’t arrive at a formal consensus, but as the discussion went on, it was clear that we meant something like Amazon EC2 more than we meant something like SaaS.  There was a lot of talk on “private clouds.”  I have no problem with the concept of private clouds, but I think you are really talking about elastic virtual computing in general, and not necessarily some nebulous resources without boundaries out in the world.  It turns out the distinction is important to most people.  It may not be important whether you call it a “cloud” or not, but it matters a lot to people where these virtual computing resources live.  Curiously, our discussion of what constituted a cloud aligned very well with Ellison’s.  (This discussion was before Ellison’s welcome keynote and his sales pitch for the Exalogic “cloud in a box”.)

The main fruit of this fruitful conversation, though, was a discussion of why no one seemed to be doing the thing that everyone was talking about.  The answer was data.  There are two issues with data – performance and security/privacy.

Performance

Several people suggested that their DBA’s would never put their databases on a VM at all, no matter where it lived.  These guys want total control over which piece of such-and-such table lives on which disk to tweak performance.  That’s nonsense if you have a VM with a virtual disk.  What’s the point of putting temporary tables on their own disk when all the drive space is shared anyway?

Personally, I suspect that this is an overblown concern.  Not all storage arrays are the same, but if you are using a high performance SAN, you are already abstracting away the disk.  A large SAN (with more than just a few disks) is not a viable solution for most non-shared infrastructure folks because it’s very expensive.  However for a shared infrastructure (including a virtual one) the case for a SAN is obvious.  It provides great I/O and everything is, in a way, already on it’s own disk, so there’s no point in segmenting your data that way.

So, I suspect DBA’s can get crusty and cling to what they know just like everybody else, and they are saying a VM would never work because they won’t be able to sleep at night without that total tweaking control.  I suspect that, but I’m not a DBA, so I have to admit that they know a lot more about it than I do.  I can suspect prejudice, but I am not really qualified to prove it.

The other issue is that I’ve used a shared SAN before.  It worked great for us in terms of I/O, but it was also supposed to provide super high availability because it was so massively redundant.  However, the particular shared infrastructure vendor I was using had multiple outages of several hours of the whole SAN.  Again, I suspect incompetence, but I’m not qualified to say whether it was incompetence or SAN’s are inherently unstable.  Another possibility is that it was incompetence of a kind, but that it’s so complicated that only super geniuses can implement them correctly.  That would be a weakness of SAN closely related to “inherently unstable” in my book.

Security/Privacy

The other main concern was handling of sensitive data.  There seemed to be some specific concerns about things like Social Security Numbers – that you couldn’t really store it in the cloud and be SOX compliant.  I can’t speak to that, but I don’t know why that concern would be different if you used a non-virtual database but had it provisioned by RackSpace or some other data-center service provider.

Tying it all Together

So the VM issue is an issue whether it’s public or private and the security issue is an issue whether its virtual or not.  Whether the objections are valid or paranoid, these are the perceptions people have.  Well, if you can’t put your data in the cloud, what’s the point?  There are very few interesting applications that don’t require a database.  There are some (performance lab is a clear use case, a marketing web site without much “application” there, some super-processing that is done periodic batches) but until you can confidently put your data in the cloud, of course you’re not going to see a lot of serious applications in the cloud.

But somehow Amazon and Google do it.  There are a couple of important things that make them special.  EC2 can be thought of as a public cloud, but it’s private to Amazon.  Amazon can put it’s sensitive data somewhere else and no one but Amazon employees can put their hands on it.  Amazon and Google both have tons of data that is not the least bit sensitive.  (What would it even mean to “steal” Google’s data?)  However, they aren’t using RDBMS’s for that stuff.  They are using various flavors of NoSQL.  That proves that it can work, but it’s useful to remind yourself: You’re not Google.