Friday, November 26, 2010

Moved to Posterous

I'm merging this blog with my Posterous blog: Time Travel Toaster. No more posts here.

Tuesday, May 25, 2010

Keeping your skeletons in the closet when open sourcing code in git

In my day job (DNC Innovation Lab), my team and I received approval to open-source some code that was started well before I arrived there. It was all stored on an internal git server, so no one thought it would be that hard to do. You get cocky after slinging code around with git for awhile. That's what good tools do.

Unfortunately there were already passwords, API keys, and other things we couldn't release publicly in our git commit history.

No problem, we thought, we'll just take a snapshot of the code, remove the bits we can't open source, and then upload that to github as a new repo, devoid of all that messy history. Easy peasy.

Er, not so much. The problem is we wanted to maintain our internal branch, complete with git history, but open up development on the open source version (which comprised >99% of all the code) to github and thus outside collaborators. So we were going to be pushing and pulling to/from github, as well as merging into our internal branch. We couldn't let non-open-source code leak into github, but we also needed to merge the open sources changes into the internal version.

Git took a look at these two branches and decided they didn't have anything to do with each other because they had no common ancestry in the commit history. This was the appropriate response from git because we had purposefully removed the commit history of the github version.

I tried various methods of merging the two codebases, but git always generated conflicts left and right because it was attempting to merge two almost-but-not-quite identical codebases with no common ancestor commits.

Here's how I solved it (I hope). Let's say the internal branch is called "internal" and the open source branch is called "opensource". The commit history of the open source branch is one über-commit (X) followed by a couple small changes (Y & Z), which we'll represent as X <-- Y <-- Z. The commit history of the internal branch is pretty long, so we'll just abbreviate it as the three most recent commits, A <-- B <-- C. So here's how our two branches start out:

internal:   A <-- B <-- C
opensource: X <-- Y <-- Z


I decided to try merging the opensource branch with the internal branch using the "ours" merge strategy on the X commit. This merge strategy just discards the changes in the other branch and considers the two branches merged anyway. So I ran:


git checkout internal
git merge -s ours (sha1 of the X commit)


Then my commit history looked like this:


internal:   A <-- B <-- C <-- D
                             /
opensource:                 X <-- Y <-- Z


The new D commit was the "merge" between C and X, but a couple quick git diffs showed that my working tree was exactly the same as the C commit, and thus had discarded the changes from X. This is what I wanted because X represented almost the exact same codebase, except for the minor changes required to open source it.


Now I was in a position to merge new commits to the opensource branch into the internal branch. I ran this (still on the internal branch):

git merge opensource


Then my commit history looked like this:





internal:   A <-- B <-- C <-- D    <--    E
                             /           /
opensource:                 X <-- Y <-- Z

So now I can merge new commits to the opensource branch (coming from github or others on my internal team) into the internal branch, and changes that are internal only can be made directly to that branch.

I'll update this post if I run into any breakage due to this.

Thursday, February 4, 2010

Setting and getting custom field values in CiviCRM hooks

CiviCRM isn't always the most predictable codebase. Recently I needed to get and set some custom field values in a hook I was writing. The hook's job was to calculate some custom field values and create some contact references when a contribution was created or updated. As always, dlobo was a huge help (he's the CiviCRM guru, find him in #civicrm on Freenode). Here's what I did to set a couple of custom fields in my _pre hook:


$custom_fields = array('foo' => 'custom_1', 'bar' => 'custom_2');
function modulename_civicrm_pre ($op, objectName, $objectId, &$objectRef) {
  if ($objectName != 'Contribution' || ($op != 'edit' && $op != 'create')) {
    return;
  }
  $contribution_id = $objectId;
  require_once 'CRM/Core/BAO/CustomValueTable.php';
  $my_foo = 'blah';
  $my_bar = 'baz';
  $set_params = array('entityID' => $contribution_id,
    $custom_fields['foo'] => $my_foo, $custom_fields['bar'] => $my_bar);
  CRM_Core_BAO_CustomValueTable::setValues($set_params);
}


And here's an example for retrieving some custom field values from the contact object in the same hook:


$custom_fields = array('contact_foo' => 'custom_3', 'contact_bar' => 'custom_4');
function modulename_civicrm_pre ($op, objectName, $objectId, &$objectRef) {
  if ($objectName != 'Contribution' || ($op != 'edit' && $op != 'create')) {
    return;
  }
  // set the field names to 1 that we want to get back
  $get_params = array('entityID' => $objectRef['contact_id'],
    $custom_fields['contact_foo'] => 1, $custom_fields['contact_bar'] => 1);
  require_once 'CRM/Core/BAO/CustomValueTable.php';
  $values = CRM_Core_BAO_CustomValueTable::getValues($get_params);
  $my_cfoo = $values[$custom_fields['contact_foo']];
  $my_cbar = $values[$custom_fields['contact_bar']];
}


So it's not ideal that you have to hard-code the custom field IDs; there should be a way to look them up (maybe there is). But it's not the worst thing in the world unless you're in the habit of destroying and recreating your custom fields from time to time. Probably you're not on a production system.

Tuesday, January 19, 2010

A list of things technology vendors must never screw up

  1. Communication
That's it. Everything else is forgivable, as long as you tell me about it--see what I just did there? All of our communication should specify precisely what, when, how, and why. What are you doing? When are you doing it (or when are you updating me next)? How are you going to do it? Why are we doing this in the first place? I should be able to expect this from you, and you should be demanding it from me.

Here are some examples:
  1. If you're going to shut down my servers, you need to either be on the phone with me saying, "OK, I'm shutting this down now. Is that alright?" or you need to be staring intently and gravely at a ticket that says, "Yes, you may shut down that server on [date] at [time]," and have a calendar and clock nearby that are set correctly and correspond to [date] and [time]. If any of the above criteria aren't met, don't shut down my effing servers! I'm looking at you, Rackspace.
  2. If I can't access my servers but they're not *down* per se, that's still an emergency. I'm opening tickets, calling, and e-mailing you and you're taking your sweet time to get back to me, and even then you just say, "We're working on a new release of our software, so all our techs are busy. But they'll get to your issue as soon as they can." Nope, that doesn't work. But lucky for you, Rightscale, it's an easy fix. You should have said, "Hey, sorry this is taking so long. We're in the middle of a release of our software. I'll go and find out exactly when someone can get to your issue and call you back right away." You don't even have to do anything faster than you would have, just freaking tell me!
This is pretty simple, but if I had to give the tech industry a grade on it right now, it would be a F---. Fail. Whale.

Here's the most frustrating part: My clients would say that I'm the pot calling the kettle black. They want better communication from me when something is down or inaccessible. But usually the reason I can't give them that is I'm spending all of my time wrestling with you, the vendor, to get some kind of real information. If you were communicating with me, I could be passing that along to my clients and we could all get back to actually fixing the problem. Wouldn't that be nice?

Thursday, January 14, 2010

RESTful web services for value objects

OK, this one's gonna be pretty nerdy, even for me. Yeah.

In the world of object-oriented software design, we talk about things we're modeling as falling into 1 of 2 categories: Entities and Value Objects. Entities have identity separate from their constituent attributes, while Value Objects do not. So I, for example, am an Entity, but the color magenta is a Value Object. Even if you list my attributes (first name: Wes, last name: Morgan, hair color: blond, eye color: hazel, height: 5'11"), that doesn't fully encompass who I am because there is probably at least one other guy out there matching that same description. Freaky Friday. But if you were trying to find me, that other guy just won't do. Magenta, however, is a color that can be represented by 3 red, green, and blue values, for example. Magenta's magenta. This is nice because you can create an immutable, throw-away magenta instance of a Color class to represent it in your code. Decide you like chartreuse better? Throw away your magenta instance and instantiate a new Color object for chartreuse. This has several handy side-effects for your code. Entities require much more care and feeding. I should know, I am one. And so is that jerk imposter whom I will crush.

Anyway! REST works very well on Entities. You have a URL like http://server.org/people/34 and you call various HTTP methods on person #34 (GET, PUT, POST, DELETE). How rude of you. But it all works. Wam, bam, thank ya ma'am.

But what about Value Objects? If you wanted to write an RGB to HTML hex value web service that converted colors from red, green, and blue values to their equivalent HTML hex value, how would make that RESTful? What's the resource? What operations do you map the HTTP verbs to? Here's a real-world example I ran into this week that highlights the conundrum and how I solved it:

I have a piece of software that converts ZIP codes into City, State pairs. This is how we in the U.S. of A. do postal codes. It can also take full street addresses and sanitize them and find out their 9-digit ZIP code. For those outside the U.S., the 9-digit ZIP code is extra special because it can tell you exactly where that address is geographically, but almost no one knows the final four digits of their code. So having software that can derive that code from their street address (which they do know) is super handy. My mission was to wrap this software in a Ruby on Rails layer to create RESTful web services out of these functions.

Step 1. ZIP -> City, State : ZIP codes are a great example of a Value Object. The ZIP code of my office in downtown Denver, CO is 80202. 80202 is the same as 80202, even though they're at different places on this page. You don't care which one I typed first, they're interchangeable. So it would be silly to create a RESTful web service that did something like this: GET /zips/89 which would return the ZIP code w/ id 89 and the associated city and state. Let's say it's 73120 in Oklahoma City, OK. Why assign ZIP codes an id number? Why not just do this: GET /zips/73120 and have that return Oklahoma City, OK? So that's what I did. The value *is* the id. That works great, moving on.

Step 2. Address -> Better Address (incl. the magical 9-digit ZIP code) : Now things get trickier. An address is still a value object. 123 Main St., Missoula, MT is the same as 123 Main St., Missoula, MT. We don't need to assign id's to these objects, they're just the sum of their parts. But unlike ZIP codes, more than one attribute is required before you can say you have an Address object. You need 1-2 lines of house number, street, apartment number, etc. information, then a city, state, and 5-9 digit ZIP code. That's a bit harder to put into a URL. Ideally we want to be able to label the different attributes so we can easily keep track of what's what. Something like this:
GET /addresses/123%20Main%20St.%20Missoula,%20MT%2059801 ...could work, but it's kind of a hassle because you then have to write a parsing routine on the server side. It would be nicer if you could just label the components of the address when you pass it in. Query parameters to the rescue! Here's what I did:
GET /addresses/1?addr1=123%20Main%20St.&addr2=&city=Missoula&state=MT&zip=59801
Now I can very easily pass this address into my sanitizer and get back the corrected version w/ the 9-digit ZIP. The "1" in the URL is a pseudo-id placeholder. If it bothers you, you could just as easily put "address" or "thisone" or "foo" there. But don't put "get" or "lookup" or any other verb there! That's not RESTful. Only the HTTP verb should define the action being taken on the resource.

So yeah, that's how I did it. I think it's still pretty RESTful. It only uses GET, but that's all you really need for read-only Value Objects. It would probably be better to do something like GET /zips/90210/city or GET /zips/80218/state, or maybe even GET /zips/73120/city_state to avoid the double call in the common case of wanting the city and state. I still don't really know of a cleaner way to do the address sanitizer piece. It works, but it feels a bit hackish.

I would like it if smarter types chimed in and critiqued my approach. Because this is all just, like, my opinion, man.

Sunday, November 22, 2009

IPv6's Killer App (Finally): Cloud Computing

I just submitted this to IT World. Let's see if they deem it worth publishing.


IPv6 has been an approved standard for over a decade now. But its adoption has suffered from lack of a compelling reason to deploy it. Since the future scalability of the Internet depends on our adopting it soon, we should be on the lookout for so-called “killer applications” where its benefits over IPv4 can help us solve real world problems today. That will help get IPv6 rolled out before we run out of IPv4 addresses and experience all the problems that will cause. It is similar to incentivizing the use of renewable energy before cheap fossil fuels run out so the transition is less painful for all involved. This article discusses one such killer app for IPv6: Infrastructure-as-a-service cloud computing systems such as Amazon’s EC2.

Cloud computing solves many of the problems of maintaining and growing a complex server infrastructure. But it does so in a way that introduces a few new ones. One of the biggest challenges is IP addressing. IPv4, the underlying protocol powering the Internet as we know it today, as well as just about every other computer network we use on a daily basis, was not designed to accommodate the elastic nature of the cloud. Server instances need a way to find each other on an IP network, but it is not practical to assign every client account a static, publically routable IPv4 subnet, especially as they become scarcer. Depending on the cloud provider’s architecture, it is also sometimes advantageous not to use the public IPv4 address even when there is one available (due to performance and/or bandwidth cost considerations). In order to demonstrate how IPv6 can help with this, let’s start with an example scenario of how it works now on EC2.

Amazon’s Elastic Compute Cloud (EC2) service is undeniably the leader in infrastructure-as-a-service cloud computing options. When you launch an instance, it is dynamically assigned an IPv4 address in the 10.0.0.0/8 private subnet. If Amazon is actually using a subset of that class A block, they’re not saying. So system administrators have to assume the entire 10.0.0.0/8 subnet is off limits for anything else (such as VPN subnets). A routable, public IPv4 address is also assigned so you can access the instance from outside Amazon’s network in a 1:1 NAT configuration. You can optionally request that this come from a pool of static addresses reserved for your use only. Amazon calls these Elastic IPs and charges for reserving them (though not while they’re actively in use). It may be tempting to think assigning an Elastic IP to each of your instances is a good solution, but there’s a downside.

When you access an EC2 instance from another instance (for example, when your PHP-enabled web server wants to contact your MySQL database server), it’s a good idea to use the 10.0.0.0/8 address whenever possible. This is because throughput will be higher (it’s a more direct route) and Amazon doesn’t charge for the bandwidth you use if the two instances are in the same data center (Amazon calls these Availability Zones). So Elastic IPs don’t help here, because they only apply to the public-facing side. However, it’s very difficult to coordinate which instance has what private IP since they are dynamically assigned at boot. A robust cloud infrastructure should always be designed so that individual instances can be thrown away (or lost) with little to no downtime of the application(s) being served. This means instances will be coming up and down all the time, so your application configurations cannot have hard-coded DNS entries or IPv4 addresses in them.

There are 3rd party services that can help with this. Rightscale, a cloud computing control panel service, recommends the use of DNS Made Easy. This service allows you to register dynamic DNS entries and then have your instances register their IPv4 addresses with those DNS entries when they boot up. This is a workable, if clumsy, solution. It gets trickier when you need to access these servers from outside Amazon’s network, since you cannot route packets to the 10.0.0.0/8 address over the public Internet or even a private VPN (unless you’re willing to route the entire 10.0.0.0/8 subnet over the VPN, but that will almost certainly cause you problems down the road when trying to connect to other private networks using that subnet, and there are many).

Wouldn’t it be nice if we had a networking protocol that was designed to work in this kind of environment? And wouldn’t it be even better if that protocol were the one poised to run the entire Internet in a few years anyway? Oh hi, IPv6. How long were you standing there? This is awkward…

The main reason Amazon has to charge for reserved Elastic IP addresses is because we’re running out of IPv4 addresses. Current estimates say that we will start experiencing the effects of IPv4 address exhaustion in 2010, and will almost certainly be entirely out of addresses by 2012. IPv6 on the other hand, has more addresses than we could ever hope to use up. This is because IPv6 addresses are 128 bits long, as opposed to 32 bits for IPv4 addresses. 32 bits gives you around 4 billion addresses. 128 bits gets you somewhere in the neighborhood of 4.5x1015 addresses for every observable star in the known universe. That’s a lot.

But the beauty of IPv6 isn’t just that there are more addresses, it’s also in how they are assigned. Current recommendations say that each subscriber to an IPv6 network should be assigned an entire /48 prefix. This means you have 16 bits to use for further subnetting and then 64 bits still left over for assigning addresses to your hosts. Every single one of the 65,000 or so subnets you’re given can hold the square of the entire existing 32-bit addressable IPv4 Internet. Whoa. And if you move your entire deployment to a new cloud hosting provider, only the 48-bit assigned prefix changes. Your subnets and host address assignments stay the same, and the IPv6 routing protocol is designed to inform everyone of the new home of your network so everything just keeps working.

Here’s how this could play out for EC2 if Amazon decided to roll out IPv6:

  • Each EC2 client gets a /48 prefix.
  •  When you launch an instance, you can either allow dynamic autoconfiguration of the IPv6 address or you can specify the subnet and host address you want to use. This allows you to use the same static IP addresses for instances that are fulfilling the same role in your deployment. For example, if you are deploying a new database server instance, you can just assign its static version 6 IP to the new instance. (Ideally there would also be an option to assign / migrate IPv6 addresses after instances are running. This would allow you to sync up your new database to the old one before pointing your app servers at it, for example.)
  •  Any of these addresses are publically routable and accessible on the IPv6 Internet (but of course you can limit this w/ firewalling, presumably via an IPv6-capable version of Amazon’s security groups).
  •  You can then use regular old static DNS because your IP addresses are under your control once again. Just assign AAAA records for the IPv6 addresses you’re using.
  •  When administering your instances, just connect to their static IPv6 addresses.
  •  Configure your software in the cloud to connect to the other instances via IPv6 using the AAAA DNS hostnames.
  • There would be no private IPs vs. public IPs. You use the same addresses and DNS hostnames everywhere, and Amazon would waive the bandwidth charges if you stay inside your 48-bit prefix. You would no longer have to jump through hoops to get your software to use 10.0.0.0/8 addresses sometimes and publicly-routable addresses at other times.
  •  Amazon could still assign dynamic IPv4 addresses the same way they do now for legacy software that doesn’t yet support IPv6. This would also allow you to phase in IPv6.
  •  Amazon could still provide Elastic IPs for servers that need public IPv4 accessibility. In a typical web app configuration, this would just be the front-end load balancers.
  •  As the IPv6 Internet gets rolled out, your web app is already future-proofed because it has AAAA records in DNS and routable IPv6 addresses.


Cloud computing can benefit from IPv6 today, and since we need to move to IPv6 very soon anyway, it’s a win-win situation. Luckily Amazon has noticed this too (see the “Why don’t you use IPV6 addresses?” question on this page: http://docs.amazonwebservices.com/AWSEC2/2009-08-15/DeveloperGuide/index.html?IP_Information.html), and they say they are investigating it. Here’s hoping they have something to announce in the next few months. Time’s a-ticking.

Tuesday, October 27, 2009

Google Chrome's top tabs don't save any vertical screen real estate

An oft-cited benefit of tabs-at-the-top ala Google Chrome and beta builds of Safari 4 is that they save vertical screen real estate. They certainly feel like they do. After all, the tabs are up in the title bar and the address and bookmark bars are the only thing between that and the web content you're browsing. However, I noticed something interesting today when I had Chrome open on top of Safari 4. See for yourself below.



It's the exact same size vertically, at least on Mac. Whoddathunkit?