My first million

1 02 2010

So last couple of weeks have been quite crazy, trying to manage working on several projects along with some trips around Europe is both tiring and time consuming. Still, I don’t want to let this occasion go without a post for future reference.

I reached my first million! Well, million pageviews/day anyway. Although I suspect that if I would be referring to money, it would be more exciting, I feel this is also a remarkable milestone in my career!

For the past months, my team and I have been trying to cope with the continuous increase in traffic on our web application, the high demand peaks were getting out of control. Even though the architecture is horizontal scalable, it isn’t cost-wise to scale up just for those timed peaks and using something like Amazon EC2 is an option but still down the road.

In the end it took a crazy last minute deployment of a Django custom cache middleware I had been working in the past days. I should say that I’m not a fan of this rushed updates under pressure and usually strongly oppose to do it, but hey, this time it went great, results were awesome and it allowed us to reach the amazing value of 1,232,158 pageviews in one day! WOW!

For the same period, one of the machines processor usage looked something like:

Notice the difference there around 2:00PM after we deployed that new middleware, impressive right?!

Now since I think we aren’t done yet, and want to write about the first 10 million/day, I’m going back to work! More on the revised cache system on the following days, after a scheduled enhancement deployment, that promises results even more impressive!





Django cache framework: how to write custom decorators

15 12 2009

Anyone developing high requested web applications will at some point consider server side caching to reduce database load while also reducing processing time. Django applications are no exception, having a very useful cache framework that allows you to implement caching in the correct way: an add-on to increase performance, it should represent the minimum code changes possible and if disabled everything should work in the same way.

While memcached is by far the better performing and should be the primary choice in most cases, the cache framework has a abstract API supporting several cache systems, offering both a low level set/get pair based on the key provided but also some such nifty helpers that make your job easier by managing this on the background. Such is the example of the cache_page decorator, you only have to use it with against the desired views and voilà, performance increased! You can find out more about this on Django’s cache framework manual.

Now, this will do for most cases, but on particular occasions you have to have into account the logic Django uses to generate his own cache key. From the manual:

By default, Django’s cache system creates its cache keys using the requested path (e.g., "/stories/2005/jun/23/bank_robbed/"). This means every request to that URL will use the same cached version, regardless of user-agent differences such as cookies or language preferences.

On some cases, the requested path may not be enough to identify the request and you may need to have a customized key logic. Yet, before considering this, I recommend first reading the manual extensively, as it might happen that the key_prefix parameter or the vary_on_headers decorator is enough to solve your problem.

Also in the manual is:

The cache middleware caches every page that doesn’t have GET or POST parameters.

If you want to cache for example API requests, they will very likely have GET parameters. Also, if you are building a multi-client application, you might also get identical requests that refer to different clients.

I came up with a very simple decorator to handle this in a clean way:

from django.conf import settings
from django.core.cache import cache

def cache_request(f, timeout=300):
    """
    Decorator responsible of trying to load response from cache and
    if not execute function, write to cache and return response
    """
    def get_key(*args, **kwargs):
        # This implementation will be later explained...

    def wrap(*args, **kwargs):
        # Try to load reponse from cache
        cache_key = get_key(*args, **kwargs)
        response = cache.get(cache_key)
        if response == None:
            # Not matched in cache, execute function
            response = f(*args, **kwargs)
            # Write on cache
            cache.set(cache_key, response, timeout)
        return response
    return wrap

A very simple wrapper to any function, it simply tries to fetch the result from cache, if not found runs the function and stores the result. Receives a optional parameter cache expire, defaulting to 300 seconds if not received. Now all I need to do is to implement that get_key in a way that it will generate unique per client keys.

On this particular case, I wish to cache two different types of requests. Note that at any point in the request handling, I always have a settings.CLIENT_ID, set on the upper stages in the middleware.

Note: depending on your configuration django.conf.settings is likely not multi thread safe, this is, if you are making changes there in runtime, such as CLIENT_ID, it may happen that two or more simultaneous threads will write and read in the same memory position, overwriting each others and raising serious issues. Django advises against changing settings in run time, if you still need to do it, feel free to request assistance.

For my REST interface, that may have GET parameters, I used:

    from django.utils.http import urlencode
    from django.conf import settings
    def get_key(*args, **kwargs):
        """
        Given a API request returns a key to be used in the cache system
        """
        request = args[1]
        url = request.META['PATH_INFO']
        params = urlencode(request.GET)
        return  'webtv.%s.api?url=%s&params=%s' % (settings.CLIENT_ID, url, params)

So on this case I would get a “urlish” key, there is a prefix application name, a client identifier and the request details, that will do! I intentionally used request.GET and not request.META['QUERY_STRING'], this is so that I always get the ordered parameters, as the order they were written in the QUERY_STRING shouldn’t influence the result.

For my PyAMF powered AMF interface:

    from django.utils.http import urlquote
    from django.conf import settings
    import re
    def get_key(*args, **kwargs):
        """
        Given a AMF request returns a key to be used in the cache system
        """
        module = re.search('^\w+\.(?P<module>.*)\.\w+$', f.__module__).group('module')
        params = urlquote(args[1:])
        return 'myapp.%s.gateway.%s.%s?params=%s' % (settings.CLIENT_ID, module, f.__name__, params,)

In this case, since the view I’m caching can be located in several modules, I used a regular expression to retrieve the module name, appended the function name and the urlquoted string converted list of parameters.

Note: memcached doesn’t allow you to have whitespaces in the keys. If the string you are using as key has any, Django will silently hide any error and proceed without caching, so we should always escape them. The urlquote() there does the trick while maintaining the key logic.

Using this is as simple as:

@cache_request
def my_view(request):
   # View code...

Done!

Note: Please bare in mind that using this logic, since our key logic is based on client fed parameters, the cache memory size may at some point reach a huge value, depending on the variation in parameters and timeout. If this happens, the cache successful match rate will also likely be low and you lose the profit in caching at all. I advise you to chose the views you want to cache wisely and probably avoid to do so in cases that the response depends on user typed requests, ex: search views.





Sapo Codebits 2009 & jobGears

11 12 2009

Codebits has been presented in the past 2 years as an event for the Portuguese tech community, the concept is basically 3 days where there are talks, a 24 from-scratch-to-project coding contest, free pizza and a lot of caffeine drinks. A change occurred this year, Codebits had been announced as going international and therefore having all communication done in English. Due to the fact that I am living abroad, I was quick to sell the idea of going to some of my colleagues.

Fast forwarding a bit, the event took place this last week and the outcome is that some things disappointed me while some others were really positive!

First, it was really a joy to be back in Lisbon and seeing a lot of familiar faces. The set up of the event was basically the same, main stage, food place, book stand, tables to work on and some 4 or 5 Xbox 360 just to relax on short breaks. There were also a lot of the famous Sapo bean-bags. There were however 3 more secondary stages and the building was huge, in fact the longest building in Portugal, so they say. This wasn’t actually good, over the night it got really “Coldbits” and it took forever to go from side to side, so most of the time I just staid around my table.

Regarding the language, all documentation in the welcoming pack (which was very nice, thank you!) came in English. The talk schedule also included the language they would be given in, with around 40% in English, which was cool. What I never expected was that Sapo’s CTO @celso would chose to make the closing speech in Portuguese, giving the green light for everyone to do the same from then on. Well I just had to live with translating everything, but still he could have had more consideration. Still I found that there was a high number of people complaining about the complete opposite,  so I understand that this may have been hard to manage.

Now I wasn’t particularly thrilled about the coding marathon, or at least with the competition side of it. The reason for that came from my disappointment at the 2008 edition, in which I had stood up all night coding, even after a long work day, only to find a lot of people presenting slides and little or no code, so much for trying… An additional reason is that Sapo target users are in majority clients of it’s parent company, a multimedia and communications company that has almost the monopoly of the market. Although I acknowledge they do have some nice products and services, the perception I get is that they don’t really have to “try” that hard, not going much beyond that assured stake. Since I try to brainstorm targeting a global audience, Portugal included, I feel my priorities and goals are quite different. Still, I believe those 24 hours there are really productive, the environment is great and I always have someone to ask for help or an opinion, what better opportunity could I find to start jobGears then?

The idea behind came originally from a custom PHP project, developed by getGears for AEIST, live at http://jobshop.ist.utl.pt that consisted briefly on a job portal for the students of IST with the feature of being able to create and manage a CV with a very usable interface, quite unique on the web. It was always our wish to take this feature one step further and turn it into a project of it’s own. Having a fellow developer from getGears there at Codebits, we proposed ourselves to take the idea and:

  • Re develop all in Python using Django.
  • Make the system open with no registration needed, but possible to save for later editing via Facebook connect.
  • Integrate the generated CV with the Facebook profile.
  • Post future updates to the Facebook user stream.
  • Generate permalinks for sharing a non editable version, allow publishing it directly to Twitter.
  • PDF generation.
  • and everything else we had the time to.

Nice, huh?! Still this is only the beginning, we have plenty of ideas on how to take this basis further and turn it into something “you” will see value in.

After some 26 hours of coding, with a lot of processed caffeine in my system, I finally went up to the stage, quite clumsily, obviously exhausted and unprepared given the short 90 seconds I had. That could have gone better, still, right after I was quick to tweet on it to get people to try it. I never expected such a big user response, well in fact I had hoped it wouldn’t be that big, my poor development VPS just couldn’t take the sudden load and quickly went into SWAP usage, oh well… Still I’m happy for that, I saw the potential in our project and decided immediately to improve it for efficiency and build a real production environment, even staying under a low budget.

As for the results of the contest, once again I got nothing but disappointment. I congratulate all of the winners though, some had really nice ideas and I hope they manage to go live with them soon! Regardless of that, it was my objective for a long time to come out of Codebits 2009 with a serious project in hands and that was exactly what I got!

We set the deadline for the launch is the January 7th 2010. Why this particular day? Well it’s my mother’s birthday, of course! Do check back, a lot of work lays ahead but I’m pretty confident we’re onto something!

António, me and Xavi from Flumotion on the first day

Antonio, me and Xavi from Flumotion on the first day





Ajax vs. Flex: making the choice

16 11 2009

Back when I joined Flumotion, my team was given the choice between implementing a HTML/JavaScript or a Flex powered back office. Most developers I know would immediately turn down the later for a number of reasons:

  • Platform portability, notably turning down mobile users.
  • Flash plugin sometimes appears to only work properly on Windows machines, raising performance issues.
  • Commitment to a proprietary technology.
  • Smaller choice of developers and community resources.
  • What is it with Adobe and their beloved RIA buzzword anyway? I don’t see anyone else using it!
  • Obviously not as trendy as Ajax.

among others that spiced up the discussion.

But there were some particular reasons to strongly consider it in our specific case. For one, we would be working with media management, editing and playback of video, a technology closely tied with Flash at the moment, so it was really about going with Ajax with some Flex components or just Flex alone. Truth is I already had my share of pain working on projects with a heavy Ajax interface, from what our Product Manager expectations of the product, I could see those memories coming to light again. Furthermore, and more on a personal perspective, I felt tempted to give it a try and only afterward draw my conclusions. Based on that, I handed my vote out to Flex.

A few months later, I have a better shaped opinion, I had some nice surprises, as well as really disappointing ones!

I need to give credit to PyAMF for the achievement of making Python and Django integration the easiest possible. On very few occasions I had to step out of my backend bubble and worry about issues regarding frontend. The AMF protocol itself helped here, a little but nice example feature I recall is that datetimes are always passed in UTC, being the Flash plug-in responsible of handling the offset, based on the client timezone system setting. Not having to worry about details such as these was a joy and made the development process faster, allowing me to focus on backend performance. In the end I was developing a Python request handler, very far from your regular Django application, using little more than the ORM and URL handlers.

But there was a few downturns along the way. First, only the fact that code was compiled made me feel in a strange place a lot of times. Every small fix implies a full build and update on production code. I believe that hot fixing should be avoided at all cost, but it is still needed sometimes and this only made it harder, heavier and even more dangerous. Then of course there were the problems of the Flash plugin on Linux, it is really frustrating for a guy that stays 80% of his time on a console having to deal with Firefox taking away all system resources the minute you open something Flash making it virtually impossible to test without booting up a virtual machine. On AMF itself, I never quite understand for the data transferred to come “encrypted” in a weird format, it certainly didn’t made it feel safer or smaller. Also, the fact that each messaged carried a message ID and time, made it harder to smart cache responses. Although I came up with a interesting cache decorator for AMF requests that I will post in the future, I would have preferred to do it on a lower level so that I wouldn’t have to waste time decoding the AMF requests (yes, as fast as it is it’s always unneeded overhead).

But the one thing that really annoyed me was the cross domain sandbox issue. As the project architect, it was very frustrating being forced to spread several copies of a crossdomain.xml file that makes very little sense to me. To date, no one as yet succeeded in properly justifying it and proving me wrong, but I will get into this issue on a following post, don’t worry.

As a conclusion, in the context of this specific project, I believe we made the right choice. Still, I very much doubt that we will see Adobe’s “Rich Internet Applications” taking over Ajax anytime soon. The new media support on HTML5 is here to confirm that. Rare examples like Google Wave make me believe that given the right developers (I presume), it is possible to make very nice JavaScript powered applications.

That said, on a future decision, my vote would again depend on the project details and target audience, but in most cases I can think of at the moment I would have to go with Ajax.

If you have a similar experience with a different outcome, I would very much like to hear you!





It is about time!

10 11 2009

Well, I figured it is about time for me to start my very own technical blog.
The last couple of months, due to the nature of the projects I have been working on, I found myself going trough development blogs and participating in discussions a lot more often than before. It amazes me the dedication that “random” developers are willing to put into helping you with the problems you come across!
That said and since I come across a lot of interesting issues every week, I aim at sharing my own experiences or guides here that hopefully will be of help for other people in need.
Let’s see how I keep it up!