keenerview – open source zoomable flash image viewer

March 2nd, 2010

I’ve recently been working on an image viewer to display local restaurant menus. I looked around for an open source flash image viewer to modify for the development and came up empty handed… The flash community is a little odd in terms of open source. There’s a lot of components out there for sale/license without an open source parallel. This seems very odd after running with the python and jquery crowd where sharing code is a cultural norm. It is, of course, not all that  way with flash – the game developers and graphics tweakers are very active in releasing code, there are a number of fantastic 2d/3d and physics libraries which are extremely impressive. Hopefully the culture will shift with them.

In any case, with the development stable it seems only right to open it up some code for the next developer. I’ve profited immensely over the years from open source, and it feels good to give back – even if with just a tiny contribution as a zoomable image viewer. Take it, modify it, do whatever you need to do to make it work for you.

The source is available at google code, and I hope to have some documentation up before the week is out. In the mean time, here’s a simple example.

keenerview.swf?image_url=[uri or url with cross-domain privileges]

Apologies to the squeamish for the anatomy page, it was the first interesting public domain image I came across.

development as the golden hammer

February 3rd, 2010

I’ve been squirreled away for the past couple months developing, and I’ve perhaps fallen into the trap of zoning into development the point of neglecting the marketing side. As anyone who has tried to build a moderately sized public site can attest, there’s a chicken and egg situation early on. To be successful, you need content and presentation -but you also need users. Users won’t visit until there’s content, nor will users come if they don’t know you exist. The reasonable conclusion seems to be that the site needs to be built out to the point where the handful that find it see it as useful enough to tell others, and others tell others and so forth.  There’s a subjective call that needs to be made at some point – it’s time to build the audience and relegate development to the back burner. It can be a difficult move to make – and I’ve been on both sides of the fence.  I’ve worked on projects upholding the highest standards possible only to find they could not gather a subscriber-base. On the other hand I’ve released demo code  only to see it become an overnight sensation, creating a maintenance  headache in its wake. Projects fail often due to a fear of commitment, humans faced with a (potentially bad) decision will naturally stall. It’s not lazy, and it’s not always a bad thing to give ideas some time to breathe – but applied generally it’s not effective either. I consider development a second nature, it’s far easier than building an audience. Most anyone reading this would likely agree – but it is important to be mindful that marketing a site is half the battle.

Note to self – set a milestone at which to put the programming down. Stick to it.

ipad fever

January 29th, 2010

The iPad seems to be all over the tech news, lots of excitement. While I can’t say I’m down on the hardware or interface, I’m extremely frustrated by the closed nature of these devices. And it’s not just that everyone seems to want an iPhone/iPad, it’s that every hardware company seems to want to be Apple these days. If computing follows this vector it will not be long until we find ourselves locked in the gaming console model where in order to develop on hardware that we own we must plead for permission and enter into some contract to do so. If the devices being produced by Apple are step forward, the cultural baggage that comes with them  is 5 steps back. It’s a shame that this concern is so difficult to get across to a non-developer, as it’s likely to have lasting implications on the future of innovation in computing. The equity gained in the past 10 years of open source culture is far more tenuous than I imagined only a few years ago.

Peter Kirn makes a far more persuasive argument here,

django dumpdata/loaddata import errors on recursive model

January 16th, 2010

A quick note on recursive models – when creating a relation such as a category nest, it’s really easy to create a scenario where loaddata fails to load. What happens is that the dependency instance may get listed after the child in the dumpdata output. With certain database engines that relation must be resolved at point of mention during import. The solution seems obvious when you hear it, but may not seem so obvious when you hit it – in the model do not specify the order as anything other than “created” or “id” and create the recursion in drilldown order. Given these two conventions, data should be able to load regardless of the database.  I ran into trouble using a title order to get an alpha sort in the admin and just didn’t consider that the meta order would be represented in the dump as well, though upon further thought it makes perfect sense.

class Category(KeenerElementsBase):
    recursive category nest model
    parent = models.ForeignKey('self', blank=True, null=True)	

    class Meta:
        ordering = ('id',)
class Category(KeenerElementsBase):

Category model.
A nesting container, can be recursive using parent

weight = models.IntegerField(‘weight’,default=50)
parent = models.ForeignKey(‘self’, blank=True, null=True)

class Meta:
verbose_name = ‘category’
verbose_name_plural = ‘categories’
# dump/loaddata recursive relationship needs to be maintained
ordering = (‘id’,)

gdata calendar api is a mess

December 21st, 2009

I just kind of figured it’d be relatively painless to extract a few calendar entries. That was my first and second mistake. Want an “id”? Get ready for fun – depending on your context an id can mean a url, a uri, or a 26 character string – and I’m talking about a reference to the same entry. Is consistent id representation not api design 101, day one material? Anyone else needlessly sinking time into this?

noodling with python gdata calendar and iCalendar data

December 20th, 2009

I’ve started looking into means of connecting external google calendars into, and found it took longer than anticipated just to get the appropriate tools in place to get started. Importing google calendars, which I had assumed to be well documented is really lacking. The goal was to not only capture occurrences of events but to capture the recurrence rrules and datetimes that define them. I downloaded the python gdata package (gdata + atom) and began hacking at a public calendar. It turns out there’s no apparent mechanism in gdata to parse the iCalendar data itself. After considering parsing out the ical data with regular expressions, I thought better and  started looking around. There’s a vobject package that seems to do the trick.

So.. Here’s some basic noodling with the two libraries, I’ve found that there’s plenty of documentation on connecting to and collecting google calendars, but little on the actual import of the data – so here’s the missing link, it requires the gdata and vobject python packages listed above.

from gdata.calendar import service
import vobject

cs = service.CalendarService() = ''
cs.password = 'google_password'

calendar_uri = '/calendar/feeds/'
feed = cs.GetCalendarEventFeed(uri=calendar_uri)

print feed.entry
# [<gdata.calendar.CalendarEventEntry object at 0x03390550>,
#    <gdata.calendar.CalendarEventEntry object at 0x03390590>,
#    <gdata.calendar.CalendarEventEntry object at 0x033909B0>]

recurrence_text = feed.entry[0].recurrence.text
recurrence = vobject.readOne(recurrence_text)

#    DTEND: 20091213T210000
#    params for  DTEND:
#       TZID [u'America/New_York']
#    DTSTART: 20091213T200000
#    params for  DTSTART:
#       TZID [u'America/New_York']
#       TZID: America/New_York
#       DAYLIGHT
#          DTSTART: 19700308T020000
#          TZOFFSETFROM: -0500
#          TZNAME: EDT
#          TZOFFSETTO: -0400
#       STANDARD
#          DTSTART: 19701101T020000
#          TZOFFSETFROM: -0400
#          TZNAME: EST
#          TZOFFSETTO: -0500
#       X-LIC-LOCATION: America/New_York

dt_start = recurrence.contents['dtstart'][0]
print dt_start.prettyPrint()
# DTSTART: 20091213T200000
# params for  DTSTART:
#    TZID [u'America/New_York']
print dt_start.params
# {u'TZID': [u'America/New_York']}
print dt_start.params['TZID'][0]
# u'America/New_York'

print recurrence.contents['rrule'][0].value
print recurrence.contents['dtstart'][0].value
# u'20091213T200000'

optional kwargs in the django url dispatcher

December 11th, 2009

I was hitting a wall setting up a few urls where I was using optional kwargs in the url. I imagine this is a common scenario where content can live in more than one place, especially if the url itself is holding down some context to maintain an anonymous view. Here’s a quick example:

(r’^blotter/(?P<filter_name>(categories|arrests|personnel))/$’, render_blotter, {}, ‘blotter’),
(r’^blotter/(?P<filter_name>(categories|arrests|personnel))/(?P<filter_value>([\w-]+))/$’, render_blotter, {}, ‘blotter’),

The two routes are dispatched to the same named function ‘blotter’, but have a unique signature depending on what arguments are passed. A more common example might be a YYYY/, YYYY/MM, YYYY/MM/DD/ situation where they should share a common render mapping. The url dispatcher can handle this fine, and for the most part it can be handled in the application logic without much of a headache by removing kwargs equal to None before revers() is called to retrieve the url. The crux of the issue is really the url tag in the templates which is not happy unless it gets an exact match of arguments.

A better explanation is available here. In any case, I’ve put a templatetag together to handle optional kwargs. Hopefully someone will find it useful.

     exactly the same as from django.template.defaulttags.url EXCEPT kwargs equal to None are removed
     this allows a bit more flexibility than the use of {% url %} where nesting is rested on optional
     base kw arguments.


from django import template
from django.template.defaulttags import URLNode, url

register = template.Library()

class URLNodeOptional(URLNode):

    identical to django.template.defaulttags.URLNode
    but removes kwargs equal to None before resolving the url

    def render(self, context):
        for k, v in self.kwargs.items():
            if v.resolve(context) is None:
        return super(URLNodeOptional, self).render(context)

def url_optional(parser, token): 

     creates the default URLNode, then routes it to the Optional resolver with the same properties
     by first creating the URLNode, the parsing stays in django core where it belongs.

    urlnode = url(parser, token)
    return URLNodeOptional(urlnode.view_name, urlnode.args, urlnode.kwargs, urlnode.asvar)

url_optional = register.tag(url_optional)

google flash maps, more marker headroom

December 11th, 2009

A quick note on google flash maps. Somewhere in the vicinity of a few thousand markers, initialization time starts to slow down perceptibly; this is not unexpected mind you.. In the maps application I was working on, a  minority of pages passed that mark – enough to look into options, but not enough to rethink “the metaphor”. It turns out getting more headroom in flash maps is really easy. The first thing to do is to create a custom marker to replace the google pin (a movie clip asset will do) using the MarkerOptions icon property. Now the important move – cache the clip as a bitmap (marker_icon.cacheAsBitmap = true;), and pass it into the overlay as the icon. Finally make sure shadow rendering is off (MarkerOptions hasShadow property).

Caching a custom marker as a bitmap and disabling shadow isn’t going solve all your issues if you’re looking to put too many more markers in the map, but it can at least triple the marker threshold before the map becomes unusable, which if you’re dealing with a few wild fringe cases can come in handy..

django app level caching

December 11th, 2009

Every developer I know loves the cron jobs, I can think of no development more rewarding than creating an application that can feed itself. On the flip side, processing a cron does create some prickly issues. Over the past couple months I’ve hit the same issue twice, the first time was during the development of weathertronic, where the pair of task queues that govern the forecast data import required a naming convention to key into the memcache so that the rendering side never had to worry about stale data. I hit the same issue again while putting together a police blotter app for the city of Keene, NH, only this time I really wanted to preserve caching at the template output level. The cache decorator provided is great, but there isn’t any built-in mechanism to “call it off” when you know the underlying data has changed – it was a bit too blunt for my purposes.  After googling a bit, I found some snippets on clearing the django cache, but again a bit too blunt an object as there are other entries in the cache that had no need to update. Ultimately, I ended up wrapping the django cache with a bit of record keeping. I figure I’d put it out there for anyone looking to solve a similar issue..

The way to set it up is to add the below script somewhere in the python path, I created an appcache module, then at the rendering function called from the url dispatcher load it up with the namespace as the app name and your off.

from appcache import AppCacheManager
blotter_cache = AppCacheManager('blotter')
cache_response = blotter_cache.get([args/kwargs from dispatcher])
if cache_response is not None:
    # intensive processing and/or http activity here
    blotter_cache.set([args/kwargs from dispatcher])

In the cron all that’s left is to clear the namespace:

blotter_cache = AppCacheManager('blotter')

That’s it…

from django.core.cache import cache


class AppCacheManager():

    this is a simple keyword driven cache, it can be managed and cleared on
    a per app/namespace basis. Created for the situation where a cron should flush all
    http responses in cache and start over.    

    def __init__(self, namespace,ttl=3600):
        self.namespace = namespace
        self.ttl = ttl

    def _get_cache_list(self):
        cm = cache.get(self.namespace)
        return [] if cm is None else cm

    def _add_cache_list_key(self,key):        
        cl = self.cache_list
        # use set to dedupe
        cache.set(self.namespace, list(set(cl)))

    cache_list = property(_get_cache_list)

    def _key(self, *args, **kwargs):
        items = kwargs.items()
        items.sort(key=lambda k: k[0])
        bits = ['appcache',self.namespace]
        bits.extend([str(a) for a in args])
        for k, v in items:
        return '_'.join(bits)

    def delete(self):
        for c in self.cache_list:
        cache.set(self.namespace, [])

    def get(self, *args, **kwargs):
        key = self._key(*args, **kwargs)
        return cache.get(key)        

    def set(self, data, *args, **kwargs):
        key = self._key(*args, **kwargs)
        cache.set(key, data, self.ttl)        

google app engine – a retrospective

November 7th, 2009

The past few weeks I’ve immersed myself in the Google App Engine (GAE), porting over a local weather app into a scalable directory of hundreds of forecasts. It’s always a bit frustrating to leave your comfort zone, but generally it’s a rewarding to stick it out and pocket the experience – working with app engine was no different in that respect. There’s a lot to love about app engine. The cost of entry is nil and an overnight sensation should theoretically scale gracefully; furthermore the cost of resources above and beyond the quotas is reasonable.  But the benefits of the platform do little conceal a series of hard limitations in place. Many times it feels like the framework is fighting against you, and unlike the world of vps or quality commodity hosting there simply are no alternatives, no configuration files available for override or libraries to be installed to take care of a particular problem. GAE is a highly managed platform and the expectation of wiggle room against some of the more draconian limitations is foolishness.

The django compatibility along with the app patch was what drew me in initially. Had I been starting the development from scratch, I would never have gotten cornered by the framework as often as was the case – but my goal was porting over existing django. The promise of GAE is that you get django with a different models api which for the most part holds. My ported app relied on heavy use of SQL relationships without a direct GQL analogue and sizable updates based on a cron job that walks the locales. The GAE datastore isn’t a slouch, but side by side with a SQL database, it feels like slow motion. Over time I’d expect the datastore may get more highly tuned, but it’s a rude awakening when you’ve become accustomed to running 300 updates in sub-second speed over LAMP. On the GAE this particular logic was my first introduction to the DeadlineError. Without exception, no request/task/cron can take more than 30 seconds to complete and against the once speedy SQL update I hit the deadline wall in a fiery crash. I looked at the problem from every angle and came to the conclusion that there was simply no way the amount of data being used could be managed in the datastore with the deadline in place. As I would learn later looking at the quota overview, this one particular script was eating cpu like it was Fat Tuesday. Just deleting 10,000 entities after I decided to back out of the implementation basically broke the bank in terms of quota supplied CPU. If you take anything away from this post, remember that the GAE datastore is not a SQL database, if you go into GAE development thinking otherwise you will get burned. Once you accept the datastore for what it is the healing can begin. In my case, that healing would take the form of memcache, which in GAE is provided in abundance.

No critique would be complete without a mention of the other menacing limitation, the DownloadError. When your app is connected to a web service you should be prepared to encounter this. The stock fetch (a urllib2 wrapper) allows for 5 seconds to get in, out and onward with your http request – pass the threshold and you have nothing. Thankfully, that deadline can be pushed to 10 seconds using the deadline argument, but I still found that I was hitting this limit occasionally when a slowdown on the service side occured. As was the case with my datastore trouble, this was part of a background process. A 10 second limit on an http request seems tasteful when a user is on hold at the other end of the request, but in the context of a background process it’s nothing more than a cruel mistress. I understand Google’s need to implement some level of control here, but with the DeadlineError in place I’m confused why the the http timeout could not sit neatly under the standard request timeout. The lesson is clear, if your app includes a dependence on a slow or occasionally hammered service, GAE may not be the right move. In the world of commodity hosting, this is a common and easily reconciled issue – but there is simply no real solution within the GAE framework, the 10 second wall is immovable.

The timeouts wouldn’t have been so painful if my port was not based on code that took an approach to background processing where a fair amount of data was collected and stored at once. In my short time with Google App Engine, I’ve learned how to accommodate the limitations to some degree. With the use of task queues, cron, and breaking larger processes into smaller, most of the troubles seem to dissolve. It’s a good deal more work to break background processes out in this manner and other than satisfying the GAE limitations, it has no benefit whatsoever. In a more general sense, this is the common path of least resistance within the framework, no matter which limitation you may face the same concept can usually be applied, break it down into smaller processes and/or requests. In any case, let’s hope Google takes another look at background processing in future updates to GAE.

There is one last caveat I should mention – though unlike the timeouts, it’s less likely to affect you. What Google has done is to wrap specific image processing functionality a django developer would normally use PIL to accomplish. I would guess that it was developed with image thumbnail galleries in mind because that is one of the only suitable uses I can imagine. If your source material is photographic JPEG, you may be in decent shape. I, however, found myself in the unfortunate position of trying to use PNG output in dealing with png/gif inputs. The inputs image would come in with a small adaptive palette and very small image size, but once transformed within the api they became PNG32 monstrosity. It was disheartening to see a the same image come out of my development server as a 15kb PNG8 while the app engine would blow it up to 200kb+.  The available transformations aren’t all that limiting, but lack of output control combined with poor default behaviors (such as preserving the input pallete) is maddening. I expect I’m an outlier on this one, so I can only hope more use cases are considered for the images api. In my particular case, this was an insurmountable issue and I found myself moving image processing off the app engine and standing up the service off the cloud. I mention this not because you’ll hit this particular wall – but because you will likely hit a similar situation.  Perhaps you’ll need to prefetch elsewhere to workaround a timeout, or transform available data larger than the 1MB download download threshold into smaller files and reassemble them in the cloud, or present a result set count greater than 1000..  I can’t predict where or when it might happen to you – but the smart move would be to expect to do a dance or two for any moderately complex application. There are so many restrictions in play that the laws of probability are bound to kick in somewhere. I think of it as the cloud tax.

No matter the platform, some amount of frustration is inevitable. There wasn’t a whole lot I couldn’t make happen on GAE with a little more time or creativity. Coming out of the other end with a site in place on GAE is where much of the appreciation of GAE begins. The logging and dashboard are well thought through. I found it was easy to isolate the CPU intensive code based on the quotas and target optimizations effectively. The regex search of logs means looking up an error in a task queue of hundreds is relatively easy. For anyone used to greping server logs, you’ll not want to go back to the stone age.  In django, I’ve always thought such a system would be a godsend – it’s really appealing to get such a rich overview of the site without lifting a finger. Another major advantage over commodity hosting is that outgoing bandwidth is fixed at an extremely reasonable cost ($.12  p/GB) past the 1GB per day free quota.

Google App Engine is not a platform you will want to adopt blindly, I would highly encourage anyone considering such a move to opt for a rapid prototype to smoke out any showstopping limitations that happen to be built into the platform. They are certainly not the limits you’ll want to encounter a month down the road where you have virtually no control over them beyond a feature request. If you find you can work within the constraints of the platform, having the GAE team deal with the scaling and ops of your site more than offsets the less than attractive aspects of developing on the platform.