Anyone developing high requested web applications will at some point consider server side caching to reduce database load while also reducing processing time. Django applications are no exception, having a very useful cache framework that allows you to implement caching in the correct way: an add-on to increase performance, it should represent the minimum code changes possible and if disabled everything should work in the same way.
While memcached is by far the better performing and should be the primary choice in most cases, the cache framework has a abstract API supporting several cache systems, offering both a low level set/get pair based on the key provided but also some such nifty helpers that make your job easier by managing this on the background. Such is the example of the cache_page decorator, you only have to use it with against the desired views and voilà, performance increased! You can find out more about this on Django’s cache framework manual.
Now, this will do for most cases, but on particular occasions you have to have into account the logic Django uses to generate his own cache key. From the manual:
By default, Django’s cache system creates its cache keys using the requested path (e.g., "/stories/2005/jun/23/bank_robbed/"). This means every request to that URL will use the same cached version, regardless of user-agent differences such as cookies or language preferences.
On some cases, the requested path may not be enough to identify the request and you may need to have a customized key logic. Yet, before considering this, I recommend first reading the manual extensively, as it might happen that the key_prefix parameter or the vary_on_headers decorator is enough to solve your problem.
Also in the manual is:
The cache middleware caches every page that doesn’t have GET or POST parameters.
If you want to cache for example API requests, they will very likely have GET parameters. Also, if you are building a multi-client application, you might also get identical requests that refer to different clients.
I came up with a very simple decorator to handle this in a clean way:
from django.conf import settings
from django.core.cache import cache
def cache_request(f, timeout=300):
"""
Decorator responsible of trying to load response from cache and
if not execute function, write to cache and return response
"""
def get_key(*args, **kwargs):
# This implementation will be later explained...
def wrap(*args, **kwargs):
# Try to load reponse from cache
cache_key = get_key(*args, **kwargs)
response = cache.get(cache_key)
if response == None:
# Not matched in cache, execute function
response = f(*args, **kwargs)
# Write on cache
cache.set(cache_key, response, timeout)
return response
return wrap
A very simple wrapper to any function, it simply tries to fetch the result from cache, if not found runs the function and stores the result. Receives a optional parameter cache expire, defaulting to 300 seconds if not received. Now all I need to do is to implement that get_key in a way that it will generate unique per client keys.
On this particular case, I wish to cache two different types of requests. Note that at any point in the request handling, I always have a settings.CLIENT_ID, set on the upper stages in the middleware.
Note: depending on your configuration django.conf.settings is likely not multi thread safe, this is, if you are making changes there in runtime, such as CLIENT_ID, it may happen that two or more simultaneous threads will write and read in the same memory position, overwriting each others and raising serious issues. Django advises against changing settings in run time, if you still need to do it, feel free to request assistance.
For my REST interface, that may have GET parameters, I used:
from django.utils.http import urlencode
from django.conf import settings
def get_key(*args, **kwargs):
"""
Given a API request returns a key to be used in the cache system
"""
request = args[1]
url = request.META['PATH_INFO']
params = urlencode(request.GET)
return 'webtv.%s.api?url=%s¶ms=%s' % (settings.CLIENT_ID, url, params)
So on this case I would get a “urlish” key, there is a prefix application name, a client identifier and the request details, that will do! I intentionally used request.GET and not request.META['QUERY_STRING'], this is so that I always get the ordered parameters, as the order they were written in the QUERY_STRING shouldn’t influence the result.
For my PyAMF powered AMF interface:
from django.utils.http import urlquote
from django.conf import settings
import re
def get_key(*args, **kwargs):
"""
Given a AMF request returns a key to be used in the cache system
"""
module = re.search('^\w+\.(?P<module>.*)\.\w+$', f.__module__).group('module')
params = urlquote(args[1:])
return 'myapp.%s.gateway.%s.%s?params=%s' % (settings.CLIENT_ID, module, f.__name__, params,)
In this case, since the view I’m caching can be located in several modules, I used a regular expression to retrieve the module name, appended the function name and the urlquoted string converted list of parameters.
Note: memcached doesn’t allow you to have whitespaces in the keys. If the string you are using as key has any, Django will silently hide any error and proceed without caching, so we should always escape them. The urlquote() there does the trick while maintaining the key logic.
Using this is as simple as:
@cache_request
def my_view(request):
# View code...
Done!
Note: Please bare in mind that using this logic, since our key logic is based on client fed parameters, the cache memory size may at some point reach a huge value, depending on the variation in parameters and timeout. If this happens, the cache successful match rate will also likely be low and you lose the profit in caching at all. I advise you to chose the views you want to cache wisely and probably avoid to do so in cases that the response depends on user typed requests, ex: search views.