I’ve been thinking a lot lately about REST, especially in the context of API design. So I wanted to write this up to just collect my thoughts, and to share some things I’ve learned about working with HTTP and Django.
The Different HTTP Methods
I think we usually just think about GET and POST when it comes to web applications. That’s all I usually consider—is this
<form> going to send a GET request or a POST? And the answer to that is really the answer to:
“Do I want the parameters to be in the URL or not?”
But lately I’ve started making use of the other HTTP methods, which is the name given to things like GET and POST by the HTTP standard (RFC 2616). There’s actually a number of these methods.
As far as building an API is concerned, PUT and DELETE are very useful in conjunction with GET and POST. But to value them I had to take a different perspective on GET and POST.
The Value of Idempotence and Safety
To the standard, GET, PUT, and DELETE are all idempotent. That means that sending the same GET or PUT request once will have the same effect as sending it ten or a hundred times. I’m not in the habit of thinking about web requests this way. For instance, if I send a DELETE request asking a resource to be removed, and it has already been removed, my first instinct would be to return some type of message to that effect, or even an error. But HTTP is not meant to work this way.
In a large network, where I have many clients that can send these requests to a server, it quickly becomes impossible for me to impose and order on the sequence of requests. On a internal project, I have multiple programs talking to different affiliate services, and two of them could request for a resource to be DELETE-ed close to the same time. Why should one receive an error just because the other beat it to the punch?
If one of those clients wants to DELETE something, and it’s already gone, then it got what it wanted—right? That request should be considered successful. Idempotence helps make this possible. And then there is the idea of ‘safe methods’. Of the three mentioned above, only GET is considered safe. A safe method is one which “…should not have the significance of taking an action other than retrieval.” In other words, no side-effects. This is something I have seen broken a lot, and which I have committed myself: GET requests which alter something on server. The HTTP server addresses the topic very nicely (emphasis mine):
Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.
Something I had to address on my current project was how to accept data for an arbitrary affiliate and save it in the database. POST came to mind first, naturally. But in trying to keep with the spirit and design of HTTP, it did not really fit what I was trying to do.
I had resources defined for getting affiliate data, like this:
This returns the data associated with the affiliate who has the ID two. The first idea for modifying that data is this:
However, “The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI….” This definition makes POST semantically inappropriate for what I was trying to do. If I have some updated affiliate data, then that data—the ‘entity’—is not subordinate to the resource, it is the resource.
Enter PUT, which “…requests that the enclosed entity be stored under the supplied Request-URI.” Perfect. I can structure the API such that the request
will store that the new data under that resource. How is this really that different from using POST? The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request — the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource. Which is well and good in theory. But how to handle PUT requests in practice? Or specifically in Django?
The Nuts and Bolts of HttpRequest Objects
Every view function in Django gets handed at least one argument, which will be an `HttpRequest` object. It contains two useful properties, dictionaries for getting parameters out of requests:
So if we get a request like:
we can use
HttpRequest.GET['bar'] to get the value
baz. This is great, except you don’t get anything like this for PUT. So we have to deal with raw request data directly.
HttpRequest argument in view functions is called
request, so I’m going to run with that. The first thing we need to look at is
request.method, which will be the string ‘PUT’ if we got a PUT request. If not, we may want to return an error code, but I’ll get to that later.
The next thing we should do is verify the Content-Type. In a PUT request we could receive anything. Checking the Content-Type is not a guarantee, since clients can lie to us, but it our first chance to decide whether or not our request is malformed. In Django you can find that out like so:
if request.META['CONTENT_TYPE'] != 'application/json':
We wanted JSON and didn’t get it, so we give back a 501 response, which is the status code for Not Implemented. The standard requires that this is the error we return in this situation, because we received a request to PUT a Content-Type which we don’t support, and thus which is not implemented. Earlier I mentioned an error for unsupported methods. It makes sense to return a 501 if we only support PUT and we receive a POST, but this is considered a different type of error. In that case we would return a 405: Method Not Allowed. But the standard says we have to do more than that.
The response MUST include an Allow header containing a list of valid methods for the requested resource. Django lets us put arbitrary headers in our response like so:
response = HttpResponse(status=405)
response['Allow'] = 'PUT'
Ok. In the spirit of the Django book we have at the office, I have shown you long, tedious way of doing things, only to immediately blast that from existence with a short-cut:
Back to our JSON example. We have a valid Content-Type, so we want to get the actually entity from the request. In Django this is the
raw_post_data. Assuming we have a string of JSON, we can do this:
entity = json.loads(request.raw_post_data)
Not the best idea, however, since it’s possible our JSON is not valid (as I found out over the quoting issue). A better approach would be:
entity = json.loads(request.raw_post_data)
# The JSON module raises these exceptions when you try
# to parse malformed strings of JSON.
return HttpResponse(status=500, content=’Malformed JSON’)
Note that the most appropriate response here is the good old 500 which I know we’ve all seen many times: Internal Server Error. That may not sound right at first, because it makes it sound like it is the fault of the server, when instead you may think we should return a 4xx code telling the client they screwed up. But the definition for 500. The server encountered an unexpected condition which prevented it from fulfilling the request, describes exactly the problem that we run into.
More About Response Codes
I can’t remember ever caring about the different HTTP response codes for indicating success. But they exist. And they are useful for designing an API which provides semantically appropriate information. For instance, in our JSON example above we could return the usual 200 (OK) to indicate that we successfully saved the data. But we can do better.
If our PUT request resulted in the creation of a new resource—maybe that affiliate was never there in the first place—then we can return 201 (Created), along the with the most specific URI for that resource. We can even go a step farther and return a list of “resource characteristics and location(s)” if that makes sense for our application. The format of those characteristics is determined by the Content-Type, which effectively means we design our own format.
Or what often comes up with PUT requests is a situation where we have nothing to return. This happens when we update existing resources. The standard 200 response is supposed to include some response information. If we successfully PUT some new data, all we may really care about is simply telling the client that everything succeeded, with no response content. In that case we should return a 204 (No Content). A 204 response has no message body, so it’s convenient for when we have nothing of value to say back to the client, other than the thumbs up.
I didn’t really have a specific point to this email when I set out to write it, other than just doing a brain-dump of the things I’ve been thinking about REST lately, and HTTP in general. Reading through things like RFC 2616 has really opened my eyes to how expressive HTTP is, and how under-utilized it seems to be. It defines a rich collection of ways to request information and respond with that information, and yet when we write web applications we so rarely take advantage of the vocabulary that HTTP gives us.
In the end, I don’t know what practical value there is going to be in designing an application that tries to ‘speak HTTP’ in the clearest, most appropriate way possible. But I have the gut feeling that it is worthwhile.