Discussion:
Idiot's query: proper way to handle NULL in SELECT query?
Frank Miles
2010-05-08 23:46:50 UTC
Permalink
As I have slowly been converting my system to use the adaptation
methods, I've been using some direct conversions to handle NULL
data. It would be nice to be able to:

cur.execute("SELECT {variable-list} FROM table WHERE var1 = %s AND ... ;", (variable1,...))

without having to pre-check each query, and alter it if one of the
variables might be a None/NULL. Testing with mogrify shows Nones
being faithfully changed to NULL, but of course postgres doesn't
consider "variable = NULL" equivalent to "variable IS NULL".

Confident that I have missed something blindingly obvious (yet haven't
seen it in the documentation), I ask... what's the normal way of handlng
this?

Thanks!
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Frank Miles
Jan Urbański
2010-05-09 00:35:20 UTC
Permalink
Post by Frank Miles
As I have slowly been converting my system to use the adaptation
methods, I've been using some direct conversions to handle NULL
cur.execute("SELECT {variable-list} FROM table WHERE var1 = %s AND ... ;", (variable1,...))
without having to pre-check each query, and alter it if one of the
How would you like it to behave in this case? You are correct, "variable
= NULL" is definitely not the same as "variable IS NULL" and psycopg2
cannot and will not assume what you had in mind when you wrote your
query code. It limits itself to converting your Python objects into
PostgreSQL literals.
Post by Frank Miles
Confident that I have missed something blindingly obvious (yet haven't
seen it in the documentation), I ask... what's the normal way of handlng
this?
A wild guess: take a look at the COALESCE PostgreSQL function. But you
haven't really explained what is it that you would like psycopg2 to do,
so it's just a guess.

Cheers,
Jan
Frank Miles
2010-05-09 03:20:16 UTC
Permalink
Post by Jan Urbański
Post by Frank Miles
As I have slowly been converting my system to use the adaptation
methods, I've been using some direct conversions to handle NULL
cur.execute("SELECT {variable-list} FROM table WHERE var1 = %s AND ... ;", (variable1,...))
without having to pre-check each query, and alter it if one of the
How would you like it to behave in this case? You are correct, "variable
= NULL" is definitely not the same as "variable IS NULL" and psycopg2
cannot and will not assume what you had in mind when you wrote your
query code. It limits itself to converting your Python objects into
PostgreSQL literals.
Post by Frank Miles
Confident that I have missed something blindingly obvious (yet haven't
seen it in the documentation), I ask... what's the normal way of handlng
this?
A wild guess: take a look at the COALESCE PostgreSQL function. But you
haven't really explained what is it that you would like psycopg2 to do,
so it's just a guess.
Cheers,
Jan
Thanks for your suggestions, Jan.

What I'd like my app to be able to do is to make queries regardless of
whether the data have specific, non-NULL values {in python, the values
will be something other than None}; or be NULL {python:None}. ISTM that
Mogrify needs to convert the "column_name = %s" into "column_name IS
NULL" ; or "column_name <> %s" into "column_name IS NOT NULL" if the
value is specified as None. Otherwise the app - which in general must
put together a protracted series of queries in order to answer the user's
questions - must go through each of the values where the value of the
column must/not be NULL. I don't see why psycopg shouldn't do this
(but then, see subject line).

COALESCE doesn't work. While I suppose (not having tried it) that one
could construct a query like:
SELECT {variable-list} FROM table WHERE COALESCE(column_name,X) = X;
this presupposes that you have a value (X) which you can guarantee will
never exist in the table. Seems potentially dangerous and likely
less efficient.

And as perhaps wasn't completely clear in my initial question, my
concern is in structuring the WHERE clause, not altering the output of
the query to avoid Nones there.

I hope this make more sense now.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Frank
Jan Urbański
2010-05-09 09:49:51 UTC
Permalink
Post by Frank Miles
[ psycopg2 won't do that for you ]
Thanks for your suggestions, Jan.
What I'd like my app to be able to do is to make queries regardless of
whether the data have specific, non-NULL values {in python, the values
will be something other than None}; or be NULL {python:None}. ISTM that
Mogrify needs to convert the "column_name = %s" into "column_name IS
NULL" ; or "column_name <> %s" into "column_name IS NOT NULL" if the
value is specified as None.
I'm afraid no driver does that for you, and psycopg2 won't do it too.
The purpose of mogrification is not second guessing what the user
wanted, it's interpolating Python objects into the query string using
some kinds of casting rules (such as None -> NULL).
Post by Frank Miles
Otherwise the app - which in general must
put together a protracted series of queries in order to answer the user's
questions - must go through each of the values where the value of the
column must/not be NULL. I don't see why psycopg shouldn't do this
(but then, see subject line).
Again, no driver will mangle your query like that, as it would seriously
violate the POLA. To answer your question in the subject: the proper way
to handle NULLs is to properly write your queries so that they handle
NULLs. psycopg2 is just a database connector, it never *writes* the
queries, it's always the user that does that.

Another layer of abstraction (read: an ORM) might be handling this for
you, and for instance SQUAlchemy transforms filter conditions like
"filter(Object.field == None)" into "object.field IS NULL" literal SQL
strings. But the job of a database connector is executing your queries
*exactly* as you wrote them, not rewriting the query in a way that
without implementing a full SQL parser and probably throwing in some
mind reading capabilities is impossible.
Post by Frank Miles
COALESCE doesn't work. While I suppose (not having tried it) that one
SELECT {variable-list} FROM table WHERE COALESCE(column_name,X) = X;
this presupposes that you have a value (X) which you can guarantee will
never exist in the table. Seems potentially dangerous and likely
less efficient.
Yes, if that is what you are trying to do, then COALESCE might not be
what you want. How about

SELECT c1, c2 FROM table WHERE c1 IS NULL OR c1 = %s

if you use None as the parameter you will get rows where c1 is NULL (and
only those, because NULL = <whatever> is never true). If you use
something else than None, you will get rows where c1 is equal to that
something. That's just an example, the bottom line is: psycopg2 executes
quereis exactly as you ask it to, so it's up to you to give it correct
queries.

Alternatively you can take a look at higher abstraction layers that you
can use on top of psycopg2 that might do what you want: the Django ORM,
SQLAlchemy and others.

Cheers,
Jan
Frank Miles
2010-05-10 01:50:39 UTC
Permalink
Post by Jan Urbański
Post by Frank Miles
[ psycopg2 won't do that for you ]
Thanks for your suggestions, Jan.
What I'd like my app to be able to do is to make queries regardless of
whether the data have specific, non-NULL values {in python, the values
will be something other than None}; or be NULL {python:None}. ISTM that
Mogrify needs to convert the "column_name = %s" into "column_name IS
NULL" ; or "column_name <> %s" into "column_name IS NOT NULL" if the
value is specified as None.
I'm afraid no driver does that for you, and psycopg2 won't do it too.
Good to know, though (for me) disappointing.
Post by Jan Urbański
The purpose of mogrification is not second guessing what the user
wanted, it's interpolating Python objects into the query string using
some kinds of casting rules (such as None -> NULL).
Post by Frank Miles
Otherwise the app - which in general must
put together a protracted series of queries in order to answer the user's
questions - must go through each of the values where the value of the
column must/not be NULL. I don't see why psycopg shouldn't do this
(but then, see subject line).
Again, no driver will mangle your query like that, as it would seriously
violate the POLA.
What is "POLA"?
Post by Jan Urbański
To answer your question in the subject: the proper way
to handle NULLs is to properly write your queries so that they handle
NULLs. psycopg2 is just a database connector, it never *writes* the
queries, it's always the user that does that.
Another layer of abstraction (read: an ORM) might be handling this for
you, and for instance SQUAlchemy transforms filter conditions like
"filter(Object.field == None)" into "object.field IS NULL" literal SQL
strings. But the job of a database connector is executing your queries
*exactly* as you wrote them, not rewriting the query in a way that
without implementing a full SQL parser and probably throwing in some
mind reading capabilities is impossible.
I don't see where any ambiguity (or "mind reading") is involved. If one
program can do it, that says it can be done. But if it is unrealistic to
expect psycopg to have this capability, fine - that answers my question.
Post by Jan Urbański
Post by Frank Miles
COALESCE doesn't work. While I suppose (not having tried it) that one
SELECT {variable-list} FROM table WHERE COALESCE(column_name,X) = X;
this presupposes that you have a value (X) which you can guarantee will
never exist in the table. Seems potentially dangerous and likely
less efficient.
Yes, if that is what you are trying to do, then COALESCE might not be
what you want. How about
SELECT c1, c2 FROM table WHERE c1 IS NULL OR c1 = %s
if you use None as the parameter you will get rows where c1 is NULL (and
only those, because NULL = <whatever> is never true). If you use
something else than None, you will get rows where c1 is equal to that
something. That's just an example, the bottom line is: psycopg2 executes
quereis exactly as you ask it to, so it's up to you to give it correct
queries.
Along this vein, I suppose one could construct the query (for '=') like:

SELECT c2 FROM table WHERE (c1 IS NULL AND %s IS NULL) OR (c1 = %s);

and repeat the parameter in that list. Hmmn. Not too bad, though
somewhat clunky if the query depends on a significant number of
parameters.
Post by Jan Urbański
Alternatively you can take a look at higher abstraction layers that you
can use on top of psycopg2 that might do what you want: the Django ORM,
SQLAlchemy and others.
My program already requires that the users install about 10
dependencies on its Windows version. I'm really hesitant to add
even one more layer - especially as some of these layers have
version dependencies.

I was hoping that I was missing some higher-level capability within
psycopg, but if it's not there, and there is no realistic hope that it
will be within its capabilities, I will continue to do what I have in
the past: essentially rewriting the query depending on whether the
value[s] were NULL/None or not.

Thanks for helping me understand the limitations of psycopg.

-Frank
Daniele Varrazzo
2010-05-10 14:47:48 UTC
Permalink
Post by Frank Miles
I was hoping that I was missing some higher-level capability within
psycopg, but if it's not there, and there is no realistic hope that it
will be within its capabilities, I will continue to do what I have in
the past: essentially rewriting the query depending on whether the
value[s] were NULL/None or not.
Thanks for helping me understand the limitations of psycopg.
I will tell you how to solve the problem. Then why the solution is
deeply flawed. Then how to solve it definitely.

There is an adapter trick you can apply: wrap the object into an
object that, when adapted, adds the operator for you.

from psycopg2.extensions import adapt, AsIs

class Eq(object):
def __init__(self, wrapped):
self.wrapped = wrapped

def adapt_eq(eq):
if eq.wrapped is None:
return AsIs('IS NULL')
else:
return AsIs("= %s" % adapt(eq.wrapped))

psycopg2.extensions.register_adapter(Eq, adapt_eq)
Post by Frank Miles
cur.mogrify("select foo from blah where bar %s", (Eq("O'Reilly"),))
"select foo from blah where bar = 'O''Reilly'"
Post by Frank Miles
cur.mogrify("select foo from blah where bar %s", (Eq(datetime.now()),))
"select foo from blah where bar = '2010-05-09T11:23:09.977271'"
Post by Frank Miles
cur.mogrify("select foo from blah where bar %s", (Eq(None),))
'select foo from blah where bar IS NULL'

This currently works fine, but it is actually a huge hack: adaptation
should be performed on parameter values only, here it generates a
bigger chunk of expression. If Psycopg started sending queries using
different libpq functions this trick would stop working (this switch
is not currently in program anyway). This is the same reason for which
an adapter to pass table/field names to a query ("select * from
%s"...) has never been included in Psycopg.

So what? There is actually a Postgres operator that does what you
need: "IS [NOT] DISTINCT FROM": it behaves like =/<> but treats NULLs
as regular values
(http://www.postgresql.org/docs/8.4/static/functions-comparison.html):

test=> select 10 is not distinct from 10;
?column? | t

test=> select null is not distinct from 10;
?column? | f

test=> select null is not distinct from null;
?column? | t

Well, dusty corners of the documentation...

So, Psycopg has actually some limitations: that's because it is a
driver, not a complete solution to every postgres-related need and
doesn't deal with policies, only with syntax, allowing more high level
solutions to be built upon it. It is flexible enough to allow for some
nice trick, but probably shouldn't be taken too far. In your case
probably the DISTINCT operator is the best solution.

Ah, I think POLA stands for "Principle of Least Astonishment"
(http://www.c2.com/cgi/wiki?PrincipleOfLeastAstonishment)

-- Daniele
Frank Miles
2010-05-10 15:47:22 UTC
Permalink
Post by Daniele Varrazzo
Post by Frank Miles
I was hoping that I was missing some higher-level capability within
psycopg, but if it's not there, and there is no realistic hope that it
will be within its capabilities, I will continue to do what I have in
the past: essentially rewriting the query depending on whether the
value[s] were NULL/None or not.
Thanks for helping me understand the limitations of psycopg.
I will tell you how to solve the problem. Then why the solution is
deeply flawed. Then how to solve it definitely.
[agreed that this is flawed - snip]
Post by Daniele Varrazzo
So what? There is actually a Postgres operator that does what you
need: "IS [NOT] DISTINCT FROM": it behaves like =/<> but treats NULLs
as regular values
test=> select 10 is not distinct from 10;
?column? | t
test=> select null is not distinct from 10;
?column? | f
test=> select null is not distinct from null;
?column? | t
Well, dusty corners of the documentation...
Excellent! I had indeed overlooked this operator. Thanks so much
for pointing this out.
Post by Daniele Varrazzo
So, Psycopg has actually some limitations: that's because it is a
driver, not a complete solution to every postgres-related need and
doesn't deal with policies, only with syntax, allowing more high level
solutions to be built upon it. It is flexible enough to allow for some
nice trick, but probably shouldn't be taken too far. In your case
probably the DISTINCT operator is the best solution.
Ah, I think POLA stands for "Principle of Least Astonishment"
(http://www.c2.com/cgi/wiki?PrincipleOfLeastAstonishment)
-- Daniele
Thanks, Daniele!

-Frank

Federico Di Gregorio
2010-05-09 10:27:26 UTC
Permalink
Post by Frank Miles
As I have slowly been converting my system to use the adaptation
methods, I've been using some direct conversions to handle NULL
cur.execute("SELECT {variable-list} FROM table WHERE var1 = %s AND ... ;", (variable1,...))
cur.execute("""SELECT {variable-list} FROM table
WHERE (var1 = %{variable1}s OR %{variable1}s IS NULL)
AND (var2 = %{variable2}s OR %{variable2}s IS NULL)
AND ... ",
{variable1: variable1, variable2: variable2, ...})

federico
--
Federico Di Gregorio fog-NGVKUo/i/***@public.gmane.org
God is real. Unless declared integer. -- Anonymous FORTRAN programmer
Loading...