Friday, November 26, 2010

Moved to Posterous

I'm merging this blog with my Posterous blog: Time Travel Toaster. No more posts here.

Tuesday, May 25, 2010

Keeping your skeletons in the closet when open sourcing code in git

In my day job (DNC Innovation Lab), my team and I received approval to open-source some code that was started well before I arrived there. It was all stored on an internal git server, so no one thought it would be that hard to do. You get cocky after slinging code around with git for awhile. That's what good tools do.

Unfortunately there were already passwords, API keys, and other things we couldn't release publicly in our git commit history.

No problem, we thought, we'll just take a snapshot of the code, remove the bits we can't open source, and then upload that to github as a new repo, devoid of all that messy history. Easy peasy.

Er, not so much. The problem is we wanted to maintain our internal branch, complete with git history, but open up development on the open source version (which comprised >99% of all the code) to github and thus outside collaborators. So we were going to be pushing and pulling to/from github, as well as merging into our internal branch. We couldn't let non-open-source code leak into github, but we also needed to merge the open sources changes into the internal version.

Git took a look at these two branches and decided they didn't have anything to do with each other because they had no common ancestry in the commit history. This was the appropriate response from git because we had purposefully removed the commit history of the github version.

I tried various methods of merging the two codebases, but git always generated conflicts left and right because it was attempting to merge two almost-but-not-quite identical codebases with no common ancestor commits.

Here's how I solved it (I hope). Let's say the internal branch is called "internal" and the open source branch is called "opensource". The commit history of the open source branch is one über-commit (X) followed by a couple small changes (Y & Z), which we'll represent as X <-- Y <-- Z. The commit history of the internal branch is pretty long, so we'll just abbreviate it as the three most recent commits, A <-- B <-- C. So here's how our two branches start out:

internal:   A <-- B <-- C
opensource: X <-- Y <-- Z


I decided to try merging the opensource branch with the internal branch using the "ours" merge strategy on the X commit. This merge strategy just discards the changes in the other branch and considers the two branches merged anyway. So I ran:


git checkout internal
git merge -s ours (sha1 of the X commit)


Then my commit history looked like this:


internal:   A <-- B <-- C <-- D
                             /
opensource:                 X <-- Y <-- Z


The new D commit was the "merge" between C and X, but a couple quick git diffs showed that my working tree was exactly the same as the C commit, and thus had discarded the changes from X. This is what I wanted because X represented almost the exact same codebase, except for the minor changes required to open source it.


Now I was in a position to merge new commits to the opensource branch into the internal branch. I ran this (still on the internal branch):

git merge opensource


Then my commit history looked like this:





internal:   A <-- B <-- C <-- D    <--    E
                             /           /
opensource:                 X <-- Y <-- Z

So now I can merge new commits to the opensource branch (coming from github or others on my internal team) into the internal branch, and changes that are internal only can be made directly to that branch.

I'll update this post if I run into any breakage due to this.

Thursday, February 4, 2010

Setting and getting custom field values in CiviCRM hooks

CiviCRM isn't always the most predictable codebase. Recently I needed to get and set some custom field values in a hook I was writing. The hook's job was to calculate some custom field values and create some contact references when a contribution was created or updated. As always, dlobo was a huge help (he's the CiviCRM guru, find him in #civicrm on Freenode). Here's what I did to set a couple of custom fields in my _pre hook:


$custom_fields = array('foo' => 'custom_1', 'bar' => 'custom_2');
function modulename_civicrm_pre ($op, objectName, $objectId, &$objectRef) {
  if ($objectName != 'Contribution' || ($op != 'edit' && $op != 'create')) {
    return;
  }
  $contribution_id = $objectId;
  require_once 'CRM/Core/BAO/CustomValueTable.php';
  $my_foo = 'blah';
  $my_bar = 'baz';
  $set_params = array('entityID' => $contribution_id,
    $custom_fields['foo'] => $my_foo, $custom_fields['bar'] => $my_bar);
  CRM_Core_BAO_CustomValueTable::setValues($set_params);
}


And here's an example for retrieving some custom field values from the contact object in the same hook:


$custom_fields = array('contact_foo' => 'custom_3', 'contact_bar' => 'custom_4');
function modulename_civicrm_pre ($op, objectName, $objectId, &$objectRef) {
  if ($objectName != 'Contribution' || ($op != 'edit' && $op != 'create')) {
    return;
  }
  // set the field names to 1 that we want to get back
  $get_params = array('entityID' => $objectRef['contact_id'],
    $custom_fields['contact_foo'] => 1, $custom_fields['contact_bar'] => 1);
  require_once 'CRM/Core/BAO/CustomValueTable.php';
  $values = CRM_Core_BAO_CustomValueTable::getValues($get_params);
  $my_cfoo = $values[$custom_fields['contact_foo']];
  $my_cbar = $values[$custom_fields['contact_bar']];
}


So it's not ideal that you have to hard-code the custom field IDs; there should be a way to look them up (maybe there is). But it's not the worst thing in the world unless you're in the habit of destroying and recreating your custom fields from time to time. Probably you're not on a production system.

Tuesday, January 19, 2010

A list of things technology vendors must never screw up

  1. Communication
That's it. Everything else is forgivable, as long as you tell me about it--see what I just did there? All of our communication should specify precisely what, when, how, and why. What are you doing? When are you doing it (or when are you updating me next)? How are you going to do it? Why are we doing this in the first place? I should be able to expect this from you, and you should be demanding it from me.

Here are some examples:
  1. If you're going to shut down my servers, you need to either be on the phone with me saying, "OK, I'm shutting this down now. Is that alright?" or you need to be staring intently and gravely at a ticket that says, "Yes, you may shut down that server on [date] at [time]," and have a calendar and clock nearby that are set correctly and correspond to [date] and [time]. If any of the above criteria aren't met, don't shut down my effing servers! I'm looking at you, Rackspace.
  2. If I can't access my servers but they're not *down* per se, that's still an emergency. I'm opening tickets, calling, and e-mailing you and you're taking your sweet time to get back to me, and even then you just say, "We're working on a new release of our software, so all our techs are busy. But they'll get to your issue as soon as they can." Nope, that doesn't work. But lucky for you, Rightscale, it's an easy fix. You should have said, "Hey, sorry this is taking so long. We're in the middle of a release of our software. I'll go and find out exactly when someone can get to your issue and call you back right away." You don't even have to do anything faster than you would have, just freaking tell me!
This is pretty simple, but if I had to give the tech industry a grade on it right now, it would be a F---. Fail. Whale.

Here's the most frustrating part: My clients would say that I'm the pot calling the kettle black. They want better communication from me when something is down or inaccessible. But usually the reason I can't give them that is I'm spending all of my time wrestling with you, the vendor, to get some kind of real information. If you were communicating with me, I could be passing that along to my clients and we could all get back to actually fixing the problem. Wouldn't that be nice?

Thursday, January 14, 2010

RESTful web services for value objects

OK, this one's gonna be pretty nerdy, even for me. Yeah.

In the world of object-oriented software design, we talk about things we're modeling as falling into 1 of 2 categories: Entities and Value Objects. Entities have identity separate from their constituent attributes, while Value Objects do not. So I, for example, am an Entity, but the color magenta is a Value Object. Even if you list my attributes (first name: Wes, last name: Morgan, hair color: blond, eye color: hazel, height: 5'11"), that doesn't fully encompass who I am because there is probably at least one other guy out there matching that same description. Freaky Friday. But if you were trying to find me, that other guy just won't do. Magenta, however, is a color that can be represented by 3 red, green, and blue values, for example. Magenta's magenta. This is nice because you can create an immutable, throw-away magenta instance of a Color class to represent it in your code. Decide you like chartreuse better? Throw away your magenta instance and instantiate a new Color object for chartreuse. This has several handy side-effects for your code. Entities require much more care and feeding. I should know, I am one. And so is that jerk imposter whom I will crush.

Anyway! REST works very well on Entities. You have a URL like http://server.org/people/34 and you call various HTTP methods on person #34 (GET, PUT, POST, DELETE). How rude of you. But it all works. Wam, bam, thank ya ma'am.

But what about Value Objects? If you wanted to write an RGB to HTML hex value web service that converted colors from red, green, and blue values to their equivalent HTML hex value, how would make that RESTful? What's the resource? What operations do you map the HTTP verbs to? Here's a real-world example I ran into this week that highlights the conundrum and how I solved it:

I have a piece of software that converts ZIP codes into City, State pairs. This is how we in the U.S. of A. do postal codes. It can also take full street addresses and sanitize them and find out their 9-digit ZIP code. For those outside the U.S., the 9-digit ZIP code is extra special because it can tell you exactly where that address is geographically, but almost no one knows the final four digits of their code. So having software that can derive that code from their street address (which they do know) is super handy. My mission was to wrap this software in a Ruby on Rails layer to create RESTful web services out of these functions.

Step 1. ZIP -> City, State : ZIP codes are a great example of a Value Object. The ZIP code of my office in downtown Denver, CO is 80202. 80202 is the same as 80202, even though they're at different places on this page. You don't care which one I typed first, they're interchangeable. So it would be silly to create a RESTful web service that did something like this: GET /zips/89 which would return the ZIP code w/ id 89 and the associated city and state. Let's say it's 73120 in Oklahoma City, OK. Why assign ZIP codes an id number? Why not just do this: GET /zips/73120 and have that return Oklahoma City, OK? So that's what I did. The value *is* the id. That works great, moving on.

Step 2. Address -> Better Address (incl. the magical 9-digit ZIP code) : Now things get trickier. An address is still a value object. 123 Main St., Missoula, MT is the same as 123 Main St., Missoula, MT. We don't need to assign id's to these objects, they're just the sum of their parts. But unlike ZIP codes, more than one attribute is required before you can say you have an Address object. You need 1-2 lines of house number, street, apartment number, etc. information, then a city, state, and 5-9 digit ZIP code. That's a bit harder to put into a URL. Ideally we want to be able to label the different attributes so we can easily keep track of what's what. Something like this:
GET /addresses/123%20Main%20St.%20Missoula,%20MT%2059801 ...could work, but it's kind of a hassle because you then have to write a parsing routine on the server side. It would be nicer if you could just label the components of the address when you pass it in. Query parameters to the rescue! Here's what I did:
GET /addresses/1?addr1=123%20Main%20St.&addr2=&city=Missoula&state=MT&zip=59801
Now I can very easily pass this address into my sanitizer and get back the corrected version w/ the 9-digit ZIP. The "1" in the URL is a pseudo-id placeholder. If it bothers you, you could just as easily put "address" or "thisone" or "foo" there. But don't put "get" or "lookup" or any other verb there! That's not RESTful. Only the HTTP verb should define the action being taken on the resource.

So yeah, that's how I did it. I think it's still pretty RESTful. It only uses GET, but that's all you really need for read-only Value Objects. It would probably be better to do something like GET /zips/90210/city or GET /zips/80218/state, or maybe even GET /zips/73120/city_state to avoid the double call in the common case of wanting the city and state. I still don't really know of a cleaner way to do the address sanitizer piece. It works, but it feels a bit hackish.

I would like it if smarter types chimed in and critiqued my approach. Because this is all just, like, my opinion, man.