Java in the Cloud

The first breakout session I attended at the JavaOne conference (or Unconference for this part) was a discussion on cloud computing.  Now this was just an interest group, not a speaker-led presentation.  The idea is that in the group of interested people, some would have experience in the topic and could learn from others with experience while those with little or no experience could benefit from the discussion by listening in or asking questions of the “experts”.

Unfortunately, there wasn’t much expertise on hand.  It seems that no one is doing much Java in the cloud – at least not among the 30 or so people dedicated enough to show up on Sunday for the discussion.  So was it fruitful anyway? Yes, and here’s why.

First was a discussion of what it even means.  We didn’t arrive at a formal consensus, but as the discussion went on, it was clear that we meant something like Amazon EC2 more than we meant something like SaaS.  There was a lot of talk on “private clouds.”  I have no problem with the concept of private clouds, but I think you are really talking about elastic virtual computing in general, and not necessarily some nebulous resources without boundaries out in the world.  It turns out the distinction is important to most people.  It may not be important whether you call it a “cloud” or not, but it matters a lot to people where these virtual computing resources live.  Curiously, our discussion of what constituted a cloud aligned very well with Ellison’s.  (This discussion was before Ellison’s welcome keynote and his sales pitch for the Exalogic “cloud in a box”.)

The main fruit of this fruitful conversation, though, was a discussion of why no one seemed to be doing the thing that everyone was talking about.  The answer was data.  There are two issues with data – performance and security/privacy.


Several people suggested that their DBA’s would never put their databases on a VM at all, no matter where it lived.  These guys want total control over which piece of such-and-such table lives on which disk to tweak performance.  That’s nonsense if you have a VM with a virtual disk.  What’s the point of putting temporary tables on their own disk when all the drive space is shared anyway?

Personally, I suspect that this is an overblown concern.  Not all storage arrays are the same, but if you are using a high performance SAN, you are already abstracting away the disk.  A large SAN (with more than just a few disks) is not a viable solution for most non-shared infrastructure folks because it’s very expensive.  However for a shared infrastructure (including a virtual one) the case for a SAN is obvious.  It provides great I/O and everything is, in a way, already on it’s own disk, so there’s no point in segmenting your data that way.

So, I suspect DBA’s can get crusty and cling to what they know just like everybody else, and they are saying a VM would never work because they won’t be able to sleep at night without that total tweaking control.  I suspect that, but I’m not a DBA, so I have to admit that they know a lot more about it than I do.  I can suspect prejudice, but I am not really qualified to prove it.

The other issue is that I’ve used a shared SAN before.  It worked great for us in terms of I/O, but it was also supposed to provide super high availability because it was so massively redundant.  However, the particular shared infrastructure vendor I was using had multiple outages of several hours of the whole SAN.  Again, I suspect incompetence, but I’m not qualified to say whether it was incompetence or SAN’s are inherently unstable.  Another possibility is that it was incompetence of a kind, but that it’s so complicated that only super geniuses can implement them correctly.  That would be a weakness of SAN closely related to “inherently unstable” in my book.


The other main concern was handling of sensitive data.  There seemed to be some specific concerns about things like Social Security Numbers – that you couldn’t really store it in the cloud and be SOX compliant.  I can’t speak to that, but I don’t know why that concern would be different if you used a non-virtual database but had it provisioned by RackSpace or some other data-center service provider.

Tying it all Together

So the VM issue is an issue whether it’s public or private and the security issue is an issue whether its virtual or not.  Whether the objections are valid or paranoid, these are the perceptions people have.  Well, if you can’t put your data in the cloud, what’s the point?  There are very few interesting applications that don’t require a database.  There are some (performance lab is a clear use case, a marketing web site without much “application” there, some super-processing that is done periodic batches) but until you can confidently put your data in the cloud, of course you’re not going to see a lot of serious applications in the cloud.

But somehow Amazon and Google do it.  There are a couple of important things that make them special.  EC2 can be thought of as a public cloud, but it’s private to Amazon.  Amazon can put it’s sensitive data somewhere else and no one but Amazon employees can put their hands on it.  Amazon and Google both have tons of data that is not the least bit sensitive.  (What would it even mean to “steal” Google’s data?)  However, they aren’t using RDBMS’s for that stuff.  They are using various flavors of NoSQL.  That proves that it can work, but it’s useful to remind yourself: You’re not Google.

Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: