Mon, 31 Aug 2009

Dear Apple (My Digital Video Wish List)

Thank you for supporting the AVCHD format commonly used by many HD video cameras in iMovie.

However, I should point out that this support could be vastly improved if you offered native AVCHD support. You see, transcoding one hour of AVCHD into Apple Intermediate Codec results in roughly 40GB of clip files, which are handily stored in some hard-to-find location on the hard drive. And, it's not clear whether or not these files need to be archived, or are intermediate files that can be purged after final editing.

My MacBook Pro, with its original 100GB drive would have choked on a single hour of video (what with the OS, my photos and iTunes libraries consuming well over half of the drive). But while I'm thankful I upgraded my Mac to 250GB, even that begins to fill up quite quickly at 40GB/hour. So much so that I had to go archive a bunch of files to my AirDisk just so I could install Snow Leopard.

So what do you say, Apple, can we get native AVCHD support? Please?




Grand Unified Theory of Moore's and Murphy's Laws

Intel co-founder Gordon Moore predicted in 1965 that the number of transistors on a chip will double about every two years.

Physicist Edward Murphy was quoted as saying "Anything that can go wrong will go wrong."

It's not exactly groundbreaking news, but I'm today proposing a grand unification of these laws:
"The likelihood that something will go wrong doubles every two years."

You (probably) heard it here first. ;-)





Thu, 20 Aug 2009

How and Why to Pay Your (Technical) Debts

Disclaimer: While I relate stories about my work experiences, my views are my own and do not necessarily reflect those of my employer.

Recently, my company made a decision to shift from a deliberative, sequential and often secretive (work is done without visibility to the stakeholders until the stuff is ready to go out the door) software development process (called "waterfall") to a iterative, adaptive (agile), and transparent process called Scrum where business owners are a key part of the team and participate in story (work) definition and acceptance.

Technical Debt
Scrum describes a concept called "technical debt". Think of "technical debt" as the remnant work not done because it's easy enough to define the finished working product loosely enough that just putting the stamp of approval counts as "good enough".

In other words, technical debt is stuff that you're going to have to contend with eventually, but expediency, procrastination or deadlines result in putting off dealing with it.

If your house is a mess, when your friend calls you on the phone saying "hey, we're going out, wanna come?" you either say "no, I really need to do some chores around the house" or you go out and don't have a good time because all the while you're thinking of the mess you left behind at home, or you go out, but realize that when the weekend comes, you've got a huge mess on your hands.

Essentially, your ability to quickly respond to the needs of the circumstances (a friend wants you to drop everything and come with him) and your ability to respond appropriately (nobody wants to hang out with someone stressed out about the house chores they have to do) without suffering undue difficulty (the weekend is shot if you don't do the dishes and cleaning as the needs arise) is a function of how far you let things slide in the normal course of things.

So in this way, Agile encourages the conversations between engineering and the business to negotiate what work should be handled now, and what can be put off, by expressing things as relative priorities. For example, if your house is on fire, you don't tell the firemen "I'll be right out, I need to finish vacuuming". At the same time you don't install hardwood floors (customer feature request) over rotting wood (infrastructure or foundation) either.

Sins of the Past
Organizations that have historically done "waterfall" work also tend to lack having clear "definitions of done" (another Scrum term that suggests teams establish standards that must be met to assert that the product is complete and ready to ship), so products are shipped with quality varying not whether the product has evolved to a level of appropriate maturity, but rather on how well the team was able to predict exactly how long it would take to build a quality shippable product. That is, poor estimation (an endemic problem) equals poor quality. So, definitions of done tend to nip this in the bud by describing the properties of a completed product as the standard by which something is declared "ready to go" rather than some deadline that comes whizzing by.

This also means that teams moving to Scrum from Waterfall have to contend with a lot of historical debt that was incurred previously. The code they are working on may be bug-ridden (or difficult to prove is not bug-ridden due to lack of automation or repeatable functional tests), poorly documented and hard to extend and maintain. (By contrast, definitions of done in Scrum tend to aim to make well documented, well tested, using industry best practices to identify and address problems early and often)

But Engineering teams tend to want to build the software that way to start, yet typically have their their hands tied because the business expected a shippable product at the deadline and don't notice or care that the software lacks automation testing (throw more QA people at it) or proper documentation, for example.

The Scrum model says "ah, but they should care, and the rationale justifying the work should be conveyed to them." For example, convince them that poorly documented code is harder to fix later. For these reasons, Agile processes assert that teams can increase their velocity as they pay off their technical debts and become unencumbered by the sins of the past.

Healthy Skepticism
So Scrum was introduced in my organization and I was initially skeptical. The "sins of the past" seemed like they were largely incurred by an overexuberant business who wanted the product ASAP, and didn't know or care about the relationship between available time, definitions of done, and product quality.

The "transparency" aspect of Scrum suggests that you "lay down your cards", meaning that if engineering management wants the teams to spend some time backfilling automation tests of individual units of code (repeatable assertions that the code is doing what you expect such that when you change some behavior of the code, everything else continues to work as you expect), you write a story for that work, express the benefit to the organization, and let the business prioritize that story amongst all of their stories.

"Surely, this can't work", I assured myself. "An organization who expressed priorities as 'I want it ASAP' couldn't shift culturally fast enough to say 'sure, take the time to do it right'. I must 'hold back' a certain amount of my teams' time to ensure we do things the right way."

Our Scrum coaches argued against me, and I wouldn't budge... at first.

Articulation of Value
I began to come around when the business showed a willingness to accommodate technical stories. I knew, deep down, that the reason why I insisted on "doing it right" was because there was, in fact, a benefit to the business to doing it that way, I just hadn't spent the time to articulate it in a way that allowed prioritization against other stories in the queue.

As engineering was able to explain the "why this ought to be done", the business began to concede that "yes, we'd like to pay off a little of that debt to gain a little bit of velocity".

To help my teams in conveying the value of technical stories into business value, I prepared a "so that" wiki article. You see, each user story (a unit of work where the desired outcome is described in a form like "as a business owner, I want to add blinkers to my car so that drivers behind and ahead of me can know that I intend to turn right or left") has an optional "so that" clause that expresses the value of the work to be done. Often times, the "I want" part of the phrase is easy to write, but the benefit (the "so that") is much harder to articulate.

Transparency, Incentives and Planning
One of the patterns in Scrum involves maintaining an up-to-date "wall" of user stories and their progress for transparency. "Burn down charts" depict how much work is remaining in the 1-3 week window of a "sprint". And given daily reminders of their progress and "velocity" of getting work done against the goal, the team begins to take pride in their numbers, and strive to resolve issues and increase the number of points they can complete in a sprint.

Obviously, if I "held back" backfilling automation testing or documentation, for example, this would be work the teams would be mandated to do, but not given credit in their velocity calculations. And, given that they are tracked for the other stories in the sprint, the technical debt stories not tracked through scrum would mean my tech stories would be constantly pushed out or delayed.

In other words, this lack of transparency would also dis-incentivize employees to do my technical debt work.

Sure, there was an opportunity to create my own "backlog", try to run it as well as the business backlog, but the time commitment both from me and the team would have been non-trivial.

It also helped to have a backlog of technical stories "ready to go" in case a last minute business priority decision or some dependency (such as approvals for creatives, for example) delay meant lost productivity on the team. These stories would be added to the sprint, meaning that the estimated "points" to complete them would be counted in the team's velocity.

Finally, we added objectives to the team that partially grade teams on their velocity. Thus, any story that was "planned work", would count for velocity, where unplanned work wouldn't. So I was reminded of the adage, "your failure to plan does not constitute an emergency on my part”.

Overcoming Challenges
That's not to say that this was entirely panacea. Crucial conversations with team members, business owners, and stakeholders in both IT and the business had to be addressed. And I had to accept some measure of compromise when the business needed functionality done ASAP, in exchange for greater capacity in future sprints. We had to ensure that we had a process for handling high urgency unplanned work (P1/P2 bugs for example), as these could quickly derail sprints in progress.

So, in summary, if you can’t trust the business to prioritize your technical debt stories, then you need to become a better story teller, or your business has more significant problems because a key stakeholder’s properly articulated concerns are not being heard. I assert that “if it’s worth doing, it’s worth covering in either your Definition of Done, in your Acceptance Criteria, or prioritized in the Product Backlog.”

Scrum promises not only to speed up the course of software lifecycle management, it promises to increase the capacity and problem-solving abilities of your teams, and, via transparency, bring the technology and business units into greater alignment than can ordinarily be seen in a non-Scrum environment.





Wed, 12 Aug 2009

(More) Fun With Unix Commands

I decided to see what the boring cloud would look like (no stopwords, no character limits). Here's that list.

like my a as was would are of you up
from when it we I or that an on what
their is which will if but can all with they
have about and for who be by so more
our out one to this not at your the in

How I generated this list:

cat ~/*.txt | tr '[A-Z]' '[a-z]' | sed 's/ /\n/g' | \ egrep -v '[\><"#{}1234567890/|?;_.)(!&-+=:@-]'| sed 's/[,.]//' | sort | uniq -c | \ sort -n| sed 's/$/<\/font>/' | sed 's/[0-9][0-9] /">/'| sed 's/\([0-9]\)/<font size="\1/' | \ sed 's/>i</>I</' | tail -50 | shuf -n 50 > ~/wordcloud2.txt




Fun With Unix Commands

Just for fun I decided to see if I could write a long Unix command line to produce a top-25 word cloud from my blog contents.

Here's what I came up with.

should would google those voted
know needs never server technology
seems thought people three products
country speed shuttle looks couple
through problem pretty could every


And here's how I generated it from the Unix command line:
cat *.txt | tr '[A-Z]' '[a-z]' | tr ' i ' ' I '| sed 's/ /\n/g' | grep ..... | \ egrep -v '[\><"#1234567890/|_)(!&-+=:@-]'| sed 's/[{}?;,.]//' | sort | uniq -c | \ sort -n | grep -v -f ~/stopwords.txt | sed 's/$/<\/font>/' | sed 's/[0-9] /">/'| \ sed 's/\([0-9]\)/<font size="\1/' | tail -25 | shuf -n 25 > ~/wordcloud.txt

In English, what those commands do are:

  • List all the text files
  • Transfer all the words into lower case
  • Recover the word i back to I
  • Substitute every space in each text file with a carriage return (now all words are on their own line)
  • Apply my own "stop word" filter-- namely, only show words with 5 characters or more (not, me, us, she,... boring words for a cloud)
  • Pull out lines containing non-alpha characters
  • Pull off , and . and other punctuation from words
  • Sort the resulting list
  • Count the unique lines
  • Sort the counted list
  • Pull out a list of stopwords from google
  • Add a closing </font> tag to the word
  • Replace the space between the count and the last digit of the number with a "> to close the font size tag
  • Replace the first digit(s) of the count with <font size="[number]
  • Take the 25 last (most frequent) lines
  • Randomize the list of 25
  • Put the <font> tags into a file called wordcloud.txt

Another day, another (geeky) project. ;)





Mon, 10 Aug 2009

Flickr Activity Posts

Facebook has been getting some bad press over the past few months over their rather onerous terms of service. I'm a proponent of open source, and open licenses, and have made my photographs available on Flickr with a specific-- and permissive-- licensing model.

Posting photos on Facebook, even if those photos were displayed via an automated feed from Flickr, would seem to give Facebook the ability to sub-license and transfer rights to those photos that might be more liberal than the ones I applied to them originally. (This is my interpretation anyway, and the language is vague enough to allow this to happen).

For that reason, I severed my Flickr feed with Facebook, and decided that I already have the power to publish my content to those who have any interest in following me (you're reading it!).

Which brings me to this post. :) I've just completed my latest programming project, a simple perl script that processes my Flickr feed and posts the results here to my blog.

It runs hourly and submits any Flickr uploads I've provided in that span of time to my blog. The entry right before this one used the same script (but expanded to pull from a greater amount of time). For those of you who run a blosxom blog (or one which accepts plain text posts) the script is available for you to use as well. It's in Perl and requires some module installations.
#!/usr/bin/perl # Author: Khan Klatt # Released under the GNU GPL. (http://www.gnu.org/copyleft/gpl.html) # http://www.khan.org/blog/cronflickr use strict; use HTTP::Request; use LWP::UserAgent; use PHP::Serialization; # Flickr Constants my $flickr_id = '43546914@N00'; # REPLACE ME WITH YOUR OWN FLICKR ID my $flickr_format = 'php_serial'; # Date Constants my $global_date = time() - 36000000; # (to kickstart first post when no photos recently posted) my $global_date = time() - 3600; # One Hour Ago # Post Constants my $path = '/PATH/TO/YOUR/BLOG/DIRECTORY'; my $post = qq[My Recent Flickr Activity\n<div class="flickrphotos">\n]; my $photosposted = 0; my $filename = $path . "/flickr" . $global_date . ".txt"; # Get the feed from Flickr my $uri = "http://api.flickr.com/services/feeds/photos_public.gne?format=$flickr_format&id=" . $flickr_id; my $request = HTTP::Request->new(GET => $uri); my $ua = LWP::UserAgent->new; my $response = $ua->request($request); my $data = $response->{_content}; # Process the feed my $feed = PHP::Serialization::unserialize($data); foreach my $entry ($feed->{items}) { foreach my $item (@$entry) { next if $$item{date} <= $global_date; # Don't post photos older than an hour ago $post .= <<EOF; <span class="flickrphoto"> <a href="$$item{l_url}" target="_new"><img src="$$item{thumb_url}" alt="$title" /></a> </span> EOF $photosposted++; } } $post .= "</div>\n"; if ($photosposted) { umask(002); open(outfile, ">$filename") or die "Couldn't open $filename for writing."; print outfile $post; close(outfile); }
The CSS classes of flickrphotos and flickrphoto are included to help with visual presentation.

Finally, you'll need to automate this script every hour (or on whatever interval you prefer). Here's my crontab:
5 * * * * /path/to/your/flickrcron.pl # Run 5 after the hour every hour of every day
There might be a bug lurking in the logic that compares the feed's photo timestamps with the local time of the server you're on. If these are in different timezones, then photos may be missed in the pathological case (where the timestamp on the photo is much older than the timestamp on the unix box, which would be true generally if you take pictures in Hawaii and have your hosting in Japan, for example). Adding timezone support is left as an exercise for the reader. ;)




Khan Klatt

Khan Klatt's photo