optional kwargs in the django url dispatcher

December 11th, 2009

I was hitting a wall setting up a few urls where I was using optional kwargs in the url. I imagine this is a common scenario where content can live in more than one place, especially if the url itself is holding down some context to maintain an anonymous view. Here’s a quick example:

(r’^blotter/(?P<filter_name>(categories|arrests|personnel))/$’, render_blotter, {}, ‘blotter’),
(r’^blotter/(?P<filter_name>(categories|arrests|personnel))/(?P<filter_value>([\w-]+))/$’, render_blotter, {}, ‘blotter’),

The two routes are dispatched to the same named function ‘blotter’, but have a unique signature depending on what arguments are passed. A more common example might be a YYYY/, YYYY/MM, YYYY/MM/DD/ situation where they should share a common render mapping. The url dispatcher can handle this fine, and for the most part it can be handled in the application logic without much of a headache by removing kwargs equal to None before revers() is called to retrieve the url. The crux of the issue is really the url tag in the templates which is not happy unless it gets an exact match of arguments.

A better explanation is available here. In any case, I’ve put a templatetag together to handle optional kwargs. Hopefully someone will find it useful.

"""
     exactly the same as from django.template.defaulttags.url EXCEPT kwargs equal to None are removed
     this allows a bit more flexibility than the use of {% url %} where nesting is rested on optional
     base kw arguments.

     see http://code.djangoproject.com/ticket/9176
"""

from django import template
from django.template.defaulttags import URLNode, url

register = template.Library()

class URLNodeOptional(URLNode):

    """
    identical to django.template.defaulttags.URLNode
    but removes kwargs equal to None before resolving the url
    """

    def render(self, context):
        for k, v in self.kwargs.items():
            if v.resolve(context) is None:
                self.kwargs.pop(k)
        return super(URLNodeOptional, self).render(context)

def url_optional(parser, token): 

    """
     creates the default URLNode, then routes it to the Optional resolver with the same properties
     by first creating the URLNode, the parsing stays in django core where it belongs.
    """ 

    urlnode = url(parser, token)
    return URLNodeOptional(urlnode.view_name, urlnode.args, urlnode.kwargs, urlnode.asvar)

url_optional = register.tag(url_optional)

google flash maps, more marker headroom

December 11th, 2009

A quick note on google flash maps. Somewhere in the vicinity of a few thousand markers, initialization time starts to slow down perceptibly; this is not unexpected mind you.. In the maps application I was working on, a  minority of pages passed that mark – enough to look into options, but not enough to rethink “the metaphor”. It turns out getting more headroom in flash maps is really easy. The first thing to do is to create a custom marker to replace the google pin (a movie clip asset will do) using the MarkerOptions icon property. Now the important move – cache the clip as a bitmap (marker_icon.cacheAsBitmap = true;), and pass it into the overlay as the icon. Finally make sure shadow rendering is off (MarkerOptions hasShadow property).

Caching a custom marker as a bitmap and disabling shadow isn’t going solve all your issues if you’re looking to put too many more markers in the map, but it can at least triple the marker threshold before the map becomes unusable, which if you’re dealing with a few wild fringe cases can come in handy..

django app level caching

December 11th, 2009

Every developer I know loves the cron jobs, I can think of no development more rewarding than creating an application that can feed itself. On the flip side, processing a cron does create some prickly issues. Over the past couple months I’ve hit the same issue twice, the first time was during the development of weathertronic, where the pair of task queues that govern the forecast data import required a naming convention to key into the memcache so that the rendering side never had to worry about stale data. I hit the same issue again while putting together a police blotter app for the city of Keene, NH, only this time I really wanted to preserve caching at the template output level. The cache decorator provided is great, but there isn’t any built-in mechanism to “call it off” when you know the underlying data has changed – it was a bit too blunt for my purposes.  After googling a bit, I found some snippets on clearing the django cache, but again a bit too blunt an object as there are other entries in the cache that had no need to update. Ultimately, I ended up wrapping the django cache with a bit of record keeping. I figure I’d put it out there for anyone looking to solve a similar issue..

The way to set it up is to add the below script somewhere in the python path, I created an appcache module, then at the rendering function called from the url dispatcher load it up with the namespace as the app name and your off.

from appcache import AppCacheManager
blotter_cache = AppCacheManager('blotter')
cache_response = blotter_cache.get([args/kwargs from dispatcher])
if cache_response is not None:
    # intensive processing and/or http activity here
    blotter_cache.set([args/kwargs from dispatcher])

In the cron all that’s left is to clear the namespace:

blotter_cache = AppCacheManager('blotter')
blotter_cache.delete()

That’s it…

from django.core.cache import cache

#********************************************************************************************************

class AppCacheManager():

    """
    this is a simple keyword driven cache, it can be managed and cleared on
    a per app/namespace basis. Created for the situation where a cron should flush all
    http responses in cache and start over.    
    """

    def __init__(self, namespace,ttl=3600):
        self.namespace = namespace
        self.ttl = ttl

    def _get_cache_list(self):
        cm = cache.get(self.namespace)
        return [] if cm is None else cm

    def _add_cache_list_key(self,key):        
        cl = self.cache_list
        cl.append(key)
        # use set to dedupe
        cache.set(self.namespace, list(set(cl)))

    cache_list = property(_get_cache_list)

    def _key(self, *args, **kwargs):
        items = kwargs.items()
        items.sort(key=lambda k: k[0])
        bits = ['appcache',self.namespace]
        bits.extend([str(a) for a in args])
        for k, v in items:
            bits.append('='.join([str(k),str(v)]))        
        return '_'.join(bits)

    def delete(self):
        for c in self.cache_list:
            cache.delete(c)
        cache.set(self.namespace, [])

    def get(self, *args, **kwargs):
        key = self._key(*args, **kwargs)
        return cache.get(key)        

    def set(self, data, *args, **kwargs):
        key = self._key(*args, **kwargs)
        cache.set(key, data, self.ttl)        
        self._add_cache_list_key(key)

google app engine – a retrospective

November 7th, 2009

The past few weeks I’ve immersed myself in the Google App Engine (GAE), porting over a local weather app into a scalable directory of hundreds of forecasts. It’s always a bit frustrating to leave your comfort zone, but generally it’s a rewarding to stick it out and pocket the experience – working with app engine was no different in that respect. There’s a lot to love about app engine. The cost of entry is nil and an overnight sensation should theoretically scale gracefully; furthermore the cost of resources above and beyond the quotas is reasonable.  But the benefits of the platform do little conceal a series of hard limitations in place. Many times it feels like the framework is fighting against you, and unlike the world of vps or quality commodity hosting there simply are no alternatives, no configuration files available for override or libraries to be installed to take care of a particular problem. GAE is a highly managed platform and the expectation of wiggle room against some of the more draconian limitations is foolishness.

The django compatibility along with the app patch was what drew me in initially. Had I been starting the development from scratch, I would never have gotten cornered by the framework as often as was the case – but my goal was porting over existing django. The promise of GAE is that you get django with a different models api which for the most part holds. My ported app relied on heavy use of SQL relationships without a direct GQL analogue and sizable updates based on a cron job that walks the locales. The GAE datastore isn’t a slouch, but side by side with a SQL database, it feels like slow motion. Over time I’d expect the datastore may get more highly tuned, but it’s a rude awakening when you’ve become accustomed to running 300 updates in sub-second speed over LAMP. On the GAE this particular logic was my first introduction to the DeadlineError. Without exception, no request/task/cron can take more than 30 seconds to complete and against the once speedy SQL update I hit the deadline wall in a fiery crash. I looked at the problem from every angle and came to the conclusion that there was simply no way the amount of data being used could be managed in the datastore with the deadline in place. As I would learn later looking at the quota overview, this one particular script was eating cpu like it was Fat Tuesday. Just deleting 10,000 entities after I decided to back out of the implementation basically broke the bank in terms of quota supplied CPU. If you take anything away from this post, remember that the GAE datastore is not a SQL database, if you go into GAE development thinking otherwise you will get burned. Once you accept the datastore for what it is the healing can begin. In my case, that healing would take the form of memcache, which in GAE is provided in abundance.

No critique would be complete without a mention of the other menacing limitation, the DownloadError. When your app is connected to a web service you should be prepared to encounter this. The stock fetch (a urllib2 wrapper) allows for 5 seconds to get in, out and onward with your http request – pass the threshold and you have nothing. Thankfully, that deadline can be pushed to 10 seconds using the deadline argument, but I still found that I was hitting this limit occasionally when a slowdown on the service side occured. As was the case with my datastore trouble, this was part of a background process. A 10 second limit on an http request seems tasteful when a user is on hold at the other end of the request, but in the context of a background process it’s nothing more than a cruel mistress. I understand Google’s need to implement some level of control here, but with the DeadlineError in place I’m confused why the the http timeout could not sit neatly under the standard request timeout. The lesson is clear, if your app includes a dependence on a slow or occasionally hammered service, GAE may not be the right move. In the world of commodity hosting, this is a common and easily reconciled issue – but there is simply no real solution within the GAE framework, the 10 second wall is immovable.

The timeouts wouldn’t have been so painful if my port was not based on code that took an approach to background processing where a fair amount of data was collected and stored at once. In my short time with Google App Engine, I’ve learned how to accommodate the limitations to some degree. With the use of task queues, cron, and breaking larger processes into smaller, most of the troubles seem to dissolve. It’s a good deal more work to break background processes out in this manner and other than satisfying the GAE limitations, it has no benefit whatsoever. In a more general sense, this is the common path of least resistance within the framework, no matter which limitation you may face the same concept can usually be applied, break it down into smaller processes and/or requests. In any case, let’s hope Google takes another look at background processing in future updates to GAE.

There is one last caveat I should mention – though unlike the timeouts, it’s less likely to affect you. What Google has done is to wrap specific image processing functionality a django developer would normally use PIL to accomplish. I would guess that it was developed with image thumbnail galleries in mind because that is one of the only suitable uses I can imagine. If your source material is photographic JPEG, you may be in decent shape. I, however, found myself in the unfortunate position of trying to use PNG output in dealing with png/gif inputs. The inputs image would come in with a small adaptive palette and very small image size, but once transformed within the api they became PNG32 monstrosity. It was disheartening to see a the same image come out of my development server as a 15kb PNG8 while the app engine would blow it up to 200kb+.  The available transformations aren’t all that limiting, but lack of output control combined with poor default behaviors (such as preserving the input pallete) is maddening. I expect I’m an outlier on this one, so I can only hope more use cases are considered for the images api. In my particular case, this was an insurmountable issue and I found myself moving image processing off the app engine and standing up the service off the cloud. I mention this not because you’ll hit this particular wall – but because you will likely hit a similar situation.  Perhaps you’ll need to prefetch elsewhere to workaround a timeout, or transform available data larger than the 1MB download download threshold into smaller files and reassemble them in the cloud, or present a result set count greater than 1000..  I can’t predict where or when it might happen to you – but the smart move would be to expect to do a dance or two for any moderately complex application. There are so many restrictions in play that the laws of probability are bound to kick in somewhere. I think of it as the cloud tax.

No matter the platform, some amount of frustration is inevitable. There wasn’t a whole lot I couldn’t make happen on GAE with a little more time or creativity. Coming out of the other end with a site in place on GAE is where much of the appreciation of GAE begins. The logging and dashboard are well thought through. I found it was easy to isolate the CPU intensive code based on the quotas and target optimizations effectively. The regex search of logs means looking up an error in a task queue of hundreds is relatively easy. For anyone used to greping server logs, you’ll not want to go back to the stone age.  In django, I’ve always thought such a system would be a godsend – it’s really appealing to get such a rich overview of the site without lifting a finger. Another major advantage over commodity hosting is that outgoing bandwidth is fixed at an extremely reasonable cost ($.12  p/GB) past the 1GB per day free quota.

Google App Engine is not a platform you will want to adopt blindly, I would highly encourage anyone considering such a move to opt for a rapid prototype to smoke out any showstopping limitations that happen to be built into the platform. They are certainly not the limits you’ll want to encounter a month down the road where you have virtually no control over them beyond a feature request. If you find you can work within the constraints of the platform, having the GAE team deal with the scaling and ops of your site more than offsets the less than attractive aspects of developing on the platform.