Large orphans

Prelude

As you probably know there are two ways to store large objects in the PostgreSQL:

Describing pros and cons of these approaches I forgot to mention two issues of no small importance about OID storage:

  • It may save your space if several rows (or tables) contain the same binary object;
  • It may occupy your space if none of tables need some binary object, but recently did.

From PostgreSQL help:

Large objects are treated as objects in their own right; a table entry can reference a large object by OID, but there can be multiple table entries referencing the same large object OID, so the system doesn’t delete the large object just because you change or remove one such entry.

Now this is fine for PostgreSQL-specific applications, but standard code using JDBC or ODBC won’t delete the objects, resulting in orphan objects — objects that are not referenced by anything, and simply occupy disk space.

Interlude

For this purpose there is an additional supplied module – “lo”:

The lo module allows fixing this by attaching a trigger to tables that contain LO reference columns. The trigger essentially just does a lo_unlink whenever you delete or modify a value referencing a large object. When you use this trigger, you are assuming that there is only one database reference to any large object that is referenced in a trigger-controlled column!

Interlude II

However if you think multireference ability is excellent feature you would find special vacuumlo utility very useful, just as I do.

Solution

I found one more way to avoid giving a birth to LO orphans.
G, I’m a genius!🙂

Assumed:

CREATE TABLE files
(
  codfile integer NOT NULL,
  filename character varying(255),
  data oid,
  CONSTRAINT pkfiles PRIMARY KEY (codfile)
);

To avoid orphans:

CREATE OR REPLACE RULE upd_oid_rule AS ON UPDATE TO files
WHERE OLD.data IS DISTINCT FROM NEW.data AND OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
         THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data ;

For those who not so familiar with moonspeak my translation:

The great and almighty server of the Postgres, please, hear my prayer. When one updates the files table and if the new data column distinguishes from the old one do not forget to check if the old data value is not used any more and if not then delete BLOB referenced by this old value (if this BLOB exists of course). Thank you.

And one more rule:

CREATE OR REPLACE RULE del_oid_rule AS ON DELETE TO files
WHERE OLD.data IS NOT NULL
DO ALSO
SELECT
  CASE WHEN (count(files.data) <= 1) AND
    EXISTS(SELECT 1
    FROM pg_catalog.pg_largeobject
    WHERE loid = OLD.data)
       THEN lo_unlink(OLD.data)
  END
FROM files
WHERE files.data = OLD.data
GROUP BY OLD.data;

Conclusion

This approach rules of course.🙂 However, it is only if assumed that BLOBs may be duplicated only in one column of the one table. Later I’ll create stored routine with extended functionality. Stay tuned.

PS

Kids, don’t do drugs!🙂

4 thoughts on “Large orphans

  1. Since the “data” column is nullable, the condition should be “OLD.data is distinct from NEW.data” instead of “… …”, otherwise it won’t pick up updating to null value.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s