blob – Pasha Golub's Blog

Prelude

As you probably know there are two ways to store large objects in the PostgreSQL:

bytea
oid aka lo

Describing pros and cons of these approaches I forgot to mention two issues of no small importance about OID storage:

It may save your space if several rows (or tables) contain the same binary object;
It may occupy your space if none of tables need some binary object, but recently did.

From PostgreSQL help:

Large objects are treated as objects in their own right; a table entry can reference a large object by OID, but there can be multiple table entries referencing the same large object OID, so the system doesn’t delete the large object just because you change or remove one such entry.

Now this is fine for PostgreSQL-specific applications, but standard code using JDBC or ODBC won’t delete the objects, resulting in orphan objects — objects that are not referenced by anything, and simply occupy disk space.

Interlude

For this purpose there is an additional supplied module – “lo”:

The lo module allows fixing this by attaching a trigger to tables that contain LO reference columns. The trigger essentially just does a lo_unlink whenever you delete or modify a value referencing a large object. When you use this trigger, you are assuming that there is only one database reference to any large object that is referenced in a trigger-controlled column!

Interlude II

However if you think multireference ability is excellent feature you would find special vacuumlo utility very useful, just as I do.

Solution

I found one more way to avoid giving a birth to LO orphans.
G, I’m a genius! 🙂

Assumed:

CREATE TABLE files
(
  codfile integer NOT NULL,
  filename character varying(255),
  data oid,
  CONSTRAINT pkfiles PRIMARY KEY (codfile)
);

To avoid orphans:

CREATE OR REPLACE RULE upd_oid_rule AS ON UPDATE TO files
WHERE OLD.data IS DISTINCT FROM NEW.data AND OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
         THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data ;

For those who not so familiar with moonspeak my translation:

The great and almighty server of the Postgres, please, hear my prayer. When one updates the files table and if the new data column  distinguishes from the old one do not forget to check if the old data value is not used any more and if not then delete BLOB referenced by this old value (if this BLOB exists of course).  Thank you.

And one more rule:

CREATE OR REPLACE RULE del_oid_rule AS ON DELETE TO files
WHERE OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
       THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data;

Conclusion

This approach rules of course. 🙂 However, it is only if assumed that BLOBs may be duplicated only in one column of the one table. Later I’ll create stored routine with extended functionality. Stay tuned.

PS

Kids, don’t do drugs! 🙂

Tag: blob

My first poll… BLOB storing

PostgresDAC will take part in orphans executions

Large orphans