Empty Thoughts

Feed

Living on the Edge

Entries tagged “orm”

Replacing Django's ORM with SQLAlchemy

written by Michael Trier, on Jul 23, 2008 9:41:00 PM.

Will Larson just created a post titled Replacing Django’s ORM with SQLAlchemy that covers using SQLAlchemy with a Django project. It’s a great write-up, and generally what Django folks are referring to when they say, “of course you can use SQLAlchemy with Django.” I think Will does a good job of stepping you through the important bits.

One thing I appreciate is that the post hints at some of the negatives of doing something like this. You lose things like generic views, all contrib apps including the admin, most of the management commands, and some smaller pieces that are embedded in various parts of the framework. Tread this way cautiously.

Another item in the post that got my attention was when Will mentioned the following in reference to the possibility of doing a full drop in replacement for the Django ORM, essentially allowing you to use fully the Django bits alongside of the SQLAlchemy pieces:

This tutorial won’t build that Frankenstein, since it’s only interested in exploring the loose coupling aspect, but it wouldn’t be prohibitively difficult to do so. Although, it would be awkward in some regards.

Well I, and others, have been building “that Frankenstein” for some time now with a project we call Django-SQLAlchemy. There’s also a somewhat defunct project called that is a hybrid between what Will demonstrates and what we’re trying to do with django-sqlalchemy.

I’m not sure whether or not these projects are “awkward” or “Frankensteins,” but I can assure you thus far they have been prohibitively difficult, as we see no full implementation for a drop-in replacement still today. Getting the initial stuff in place is pretty darn easy, but as you get into it there are lots of little edge cases that complicate things significantly.

I want to thank Will for the great post. I think he does a great job of talking about the strengths of SQLAlchemy, and hopefully people will be inclined to experiment with the idea of using SQLAlchemy in addition to, or instead of, Django’s ORM.

Django's Queryset-Refactor Merged!

written by Michael Trier, on Apr 27, 2008 9:14:00 AM.

Just as Malcolm Tredinnick stated on This Week in Django 19, the Queryset-Refactor branch has been merged into trunk. Changeset 7477 makes that happen.

I’m sure there will be plenty of future blog posts talking about some of enhancements / changes that were made, but some of the highlights:

Ordering has been made consistent with the double-underscore syntax in filtering.
New update method that can update multiple records database side.
Addition of reverse method.
New values_list method that returns a tuple of values. This is a really nice addition.

Plus, tons of bug fixes, lots of reorganization that makes future enhancements much simpler, and lots of other goodness. See the Queryset Refactor Branch page on the wiki for more information.

For a lot of open source projects this type of thing would not have a huge impact, since with most projects the developers are running their code off of released versions. With Django, most developers have been accustomed to running their code, (yes, even production code), off of the trunk. With two very large branches, NewForms-Admin and Queryset-Refactor, it was often a difficult situation in deciding which branch would be the primary code base. For most of my stuff, I benefited more from the changes in Queryset-Refactor, and so I often settled on using that as my “trunk”. With the merge of Queryset Refactor I can shift to using NewForms-Admin (as soon as it merges to trunk), and be able to continue to enjoy all of the good stuff in Queryset Refactor.

An incredible amount thanks goes to Malcolm Tredinnick for all his work on this branch. He has done a tremendous job with the refactoring and enhancing of Django’s querysets.

Django ORM Caching Project

written by Michael Trier, on Jan 21, 2008 9:37:00 PM.

David Cramer just posted some interesting information on a caching layer for Django’s ORM that he has been developing. David is a lead developer for Curse, so he has some experience with scaling applications. It is early work, but looks very promising. The code is available for review on .

Retrieving Selective Fields with Django

written by Michael Trier, on Dec 21, 2007 12:45:00 AM.

Today there was a question on the #django IRC channel from a user name “Joe” about how to go about retrieving selective fields from a table. Joe had a model structure that looked something like:


class Category(models.Model):
    name = models.CharField(max_length=50)
    description = models.CharField(max_length=200)

class Link(models.Model):
    title = models.CharField(max_length=100)
    slug = models.SlugField(prepopulate_from=("title",))
    url = models.URLField()
    category = models.ForeignKey(Category, null=True, blank=True)
    big_ass_notes = models.TextField(blank=True)
    updated = models.DateTimeField(editable=False)
    user = models.ForeignKey(User)

The issue is that Joe wants to be able to get all Links that are tied to a specific Category but only retrieve select fields. You might want to do something like this if you needed to retrieve a subset of information and you wanted to keep from pulling particularly large fields like BLOBs or Text fields.

At first I had suggested that Joe use the extra() method to retrieve just the fields that he wanted from the Link model without having to pull in all the fields from the model. We tried something like this:


>>> from delinkuent.models import Category, Link
>>> Category.objects.get(pk=1).extra(select={'url': 'delinkuent_link.url'})
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'Category' object has no attribute 'extra'

Well obviously that does not work. The problem is that extra() is a method on the QuerySet. With get() we are only getting a single model instance object. So we tried the following:


>>> from delinkuent.models import Category, Link
>>> Category.objects.get(pk=1).link_set.extra(select={'url': 'delinkuent_link.url'})
[, ]

Well that looks better, but when we check The SQL That Django is Generating we discover the following:


>>> from django.db import connection
>>> connection.queries[-1]['sql']
u'SELECT "delinkuent_link"."id","delinkuent_link"."title","delinkuent_link"."slug","delinkuent_link"."url","delinkuent_link"."category_id","delinkuent_link"."big_ass_notes","delinkuent_link"."updated","delinkuent_link"."user_id",(delinkuent_link.url) AS "url" FROM "delinkuent_link" WHERE ("delinkuent_link"."category_id" = 1) ORDER BY "delinkuent_link"."updated" DESC'

Unfortunately that didn’t work. Notice that all we did was get an extra url item in addition to the url item that was part of the regular SELECT clause. Hence the name extra(). Unfortunately there’s no only().

After working on it a bit, and reviewing my post on Finding Lookup Items That Are Not Used, I realized that I could reverse the logic and approach the problem from a different angle.

Instead of trying to get all Links from the Category, we could get all Links where the Category is the one we want. At first you might think that getting all Links will present a problem since the field we’re trying to ignore is in the Link model. You’re right, except that Django has this cool QuerySet method called values(). The values() method will return a ValuesQuerySet, essentially a QuerySet that evaluates to a list of dictionaries. The values() method accepts a list of field names that you want to include in the ValuesQuerySet dictionary.

Now that we know about values() and reversing the logic, Joe can get what he needs by doing the following:


>>> from delinkuent.models import Category, Link
>>> Link.objects.filter(category__pk=1).values('url')
[{'url': u'http://www.djangoproject.com/'}, {'url': u'http://www.b-list.org/'}]

I always like to take a look at the SQL that Django is generating to make sure I’m getting what I want:


>>> from django.db import connection
>>> connection.queries[-1]['sql']
u'SELECT "delinkuent_link"."url" FROM "delinkuent_link" WHERE ("delinkuent_link"."category_id" = 1) ORDER BY "delinkuent_link"."updated" DESC'

Notice that the big_ass_notes field, the one we are trying to avoid, is not being selected. Perfect!!!

But There’s More…

After working through this post it occurred to me that we didn’t really need to reverse the logic. This is where looking at the SQL that is being generated is so helpful. I decided to try it using the FOO_set approach we tried earlier, but with a twist:


>>> from delinkuent.models import Category, Link
>>> Category.objects.get(pk=1).link_set.values('url')
[{'url': u'http://www.djangoproject.com/'}, {'url': u'http://www.b-list.org/'}]

If you notice that when we use FOO_set, if we add on the values() method before it gets evaluated, then we will only get the fields specified in the values method. Here’s the resulting SQL:


>>> from django.db import connection
>>> connection.queries[-1]['sql']
u'SELECT "delinkuent_link"."url" FROM "delinkuent_link" WHERE ("delinkuent_link"."category_id" = 1) ORDER BY "delinkuent_link"."updated" DESC'

Still perfect!!!

Sometimes it takes a bit of work to get where you need to be with the Django ORM. Although, in the end you find that there is a ton of flexibility built into it.

Finding Lookup Items that Are Not Used

written by Michael Trier, on Aug 12, 2007 6:17:00 PM.

The Django ORM is quite flexible and provides a lot of ways to get at the data you need without having to write raw SQL, although that option is always available. Every once in a while someone on the #django IRC channel will ask about LEFT JOIN functionality. Although, the ORM doesn’t support LEFT JOINs “out of the box” (as best as I can tell), there are creative approaches to the problem.

Setting Up

Say we have a hypothetical problem of trying to find all items in a lookup table that are not being used in the master table. Why don’t you join me (excuse the pun) and we can have some shell fun. Start with the following model structure:


class Section(models.Model):
    name = models.CharField(maxlength=50)
    slug = models.SlugField(prepopulate_from=("name",),)
    position = models.IntegerField(default=1)

class Post(models.Model):
    pub_date = models.DateTimeField()
    title = models.CharField(maxlength=255)
    slug = models.SlugField(prepopulate_from=("title",), unique_for_date='pub_date')
    summary = models.TextField(help_text='Use regular non formatted text.')
    body = models.TextField()
    section  = models.ForeignKey(Section)

I’ve removed all the noise so we can keep things clean. The key items here is that there is a ForeignKey relationship between Posts and Sections. In Rails terminology we would say that each Post belongs to a Section and each Section has many Posts. You get the idea.

Let’s continue by putting in some data so we have something to work with:


>>> import datetime
>>> from blog.models import Section, Post
>>> s = Section.objects.create(name='Django', slug='django', position=1)
>>> Section.objects.create(name='Javascript', slug='javascript', position=2)

>>> Section.objects.create(name='Rails', slug='rails', position=3)

>>> Section.objects.all()
[, , ]

>>> p = Post(pub_date=datetime.datetime.now(), title='Django Rocks!', slug='django-rocks', summary='Article in which I explain the rockingness of Django.', body='Enough said', section=s)
>>> p.save()
>>> p

At this point we should have three Sections: Django, Javascript, and Rails, and one post that is assigned to the Django Section. Now that we have everything set up and we’re all on the same page, let’s play around a bit.

Common Foreign Key Usage

It’s pretty easy to find the Section for a Post:


>>> p.section

If a model has a ForeignKey we can access the related object through the model attribute.

Likewise if we have a particular section we can find all Posts that tied to it:


>>> s.post_set.all()
[]

Django automatically interjects a manager to the foreign master table called FOO_set, by default, where FOO is the master table name in lowercase. You can override this default using the related_name parameter. The Django documentation explains this quite well.

If our ForeignKey field (in this case section) allows nulls, we can easily find all Posts where Section is not specified. Note we didn’t set up our model or data to handle this scenario, but it is here just for completeness.


>>> Post.objects.filter(section__isnull=True)
[]

How it Looks in SQL

So back to our original problem. Things get a bit more difficult when we want to find “orphaned” records. What if we want to find all Sections that do not have any Posts?

Sometimes it is easiest to think about how you would get what you need in SQL and then work backwards from there. Normally in SQL I would use one of the following queries:


SELECT * 
FROM blog_section
LEFT OUTER JOIN blog_post
  ON blog_section.id = blog_post.section_id
WHERE blog_post.id IS NULL


SELECT * 
FROM blog_section
WHERE blog_section.id NOT IN (SELECT DISTINCT blog_post.section_id FROM blog_post)

My personal preference is the second approach, but both will get you the same results. I’ve kept any kind of optimizations out of the equation to keep things simple.

Throughout I will be viewing the generated SQL using the helper method I posted about previously. If you’re following along, be sure to import it.

Using Q Objects

So back to the Django ORM. Regarding the first approach, although using LEFT JOINs are not possible with the standard ORM, they can be accomplished using a custom Q object. For instance, some folks have created custom Left Outer Join Q objects here and here. When get_sql() is called on a Q object, it returns a tuple of joins, where, and params. The joins argument is a dictionary that describes the join condition for the generated sql clause. The join condition can be modified to do what we want. Using one of these snippets I can do the following:


>>> from django.db.models.query import Q, QAnd, QOr
>>> from utils import QLeftOuterJoins
>>> q = Section.objects.filter(QLeftOuterJoins(Q(post__id__isnull=True)))
>>> q
[, ]
>>> show_sql(q)
'SELECT "blog_section"."id", "blog_section"."name", "blog_section"."slug", 
"blog_section"."position"  
FROM "blog_section" 
LEFT OUTER JOIN "blog_post" AS "blog_section__post" 
  ON "blog_section"."id" = "blog_section__post"."section_id" 
WHERE ("blog_section__post"."id" IS NULL) 
ORDER BY "blog_section"."position" ASC'

Looks like we nailed it with a little bit of outside help.

Using Custom SQL Or Multiple Queries

The only two methods I’ve come up with for duplicating the second (NOT IN) approach is to:

1. Write custom sql for the sub select. There are lots of approaches here, but here’s just one.


>>> q = Section.objects.extra(where=["id NOT IN (SELECT DISTINCT section_id FROM blog_post)"])
>>> q 
[, ]
>>> show_sql(q)
'SELECT "blog_section"."id", "blog_section"."name", "blog_section"."slug", 
"blog_section"."position"  
FROM "blog_section" 
WHERE id NOT IN (SELECT DISTINCT section_id FROM blog_post) 
ORDER BY "blog_section"."position" ASC'

2. Use multiple queries. For instance we can find the distinct items that exist in the master table and use a NOT IN syntax to exclude those items. For example (note my Python is not very good, so if there’s a more concise way, please let me know):


>>> posts = Post.objects.all()
>>> s = [p.section_id for p in posts]
>>> s
[1]
>>> q = Section.objects.exclude(id__in=s)
>>> q
[, ]
>>> show_sql(q)
'SELECT "blog_section"."id", "blog_section"."name", "blog_section"."slug", 
"blog_section"."position"  
FROM "blog_section" 
WHERE ((NOT ("blog_section"."id" IN (1)))) 
ORDER BY "blog_section"."position" ASC'

So we get the right answer, but it’s surely not the most efficient method. We could optimize it a bit by finding the distinct section_ids in use, but then we’re back to writing custom sql.

Using Extra()

Although it may not be a common usage approach there is actually a third way we can find the “orphaned” sections using SQL. Although the results turn out to be quite odd if you’re using straight SQL, it will help us get to where we need to be:


SELECT * 
FROM "blog_section" , "blog_post" 
WHERE blog_section.id NOT IN (blog_post.section_id)

As it turns out we can achieve the above using the standard Django ORM through the use of the extra() method. James Bennett mentioned that it was possible, but I never caught the solution. So I poked around for a few hours and was finally able to come up with an answer:


>>> q = Section.objects.extra(tables=["blog_post",], where=["blog_section.id NOT IN (blog_post.section_id)",])
>>> q
[, ]
>>> show_sql(q)
'SELECT "blog_section"."id", "blog_section"."name", "blog_section"."slug", 
"blog_section"."position"  
FROM "blog_section" , "blog_post" 
WHERE blog_section.id NOT IN (blog_post.section_id) 
ORDER BY "blog_section"."position" ASC'

Perfect! That’s what we were looking for.

I don’t know about you, but I’m definitely exhausted. Now clearly there are probably ten more ways to attack this problem. What other approaches have you used to arrived at a solution to this problem?