BYTEA vs OID (Large Objects)

What storage for BLOBs is better and why?

In PostgreSQL there are two ways for storing Binary Large OBjects (BLOBs). First, is to use bytea type. And the second, is to use ability of Large Objects

  • BYTEA type

BYTEA type or binary string is very similar to simple character strings, like varchar and text. However, they are distinguished by two characteristics:

First, binary strings specifically allow storing octets of value zero and other “non-printable” octets (usually, octets outside the range 32 to 126). Character strings disallow zero octets, and also disallow any other octet values and sequences of octet values that are invalid according to the database’s selected character set encoding.

Second, operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are appropriate for storing text.

Similitude means that BYTEA values will be included to the result data, which may cost efficiency. In other words, PostgresDAC will load all data from server locally and then user will be able to work with it.

Developer must also remember sending BYTEA values to server that octets of certain values must be escaped. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes, e.g. ‘\xxx’. Thus in the worst case the size of the data sent through network may be increased a lot.

  • OID type (Large Objects)

PostgreSQL has a large object facility, which provides stream-style access to user data that is stored in a special large-object structure. Streaming access is useful when working with data values that are too large to manipulate conveniently as a whole.

All large objects are placed in a single system table called pg_largeobject. PostgreSQL also supports a storage system called “TOAST” that automatically stores values larger than a single database page into a secondary storage area per table. This makes the large object facility partially obsolete. One remaining advantage of the large object facility is that it allows values up to 2 GB in size, whereas TOASTed fields can be at most 1 GB. Also, large objects can be randomly modified using a read/write API that is more efficient than performing such operations using TOAST.

The large object implementation breaks large objects up into “chunks” and stores the chunks in rows in the database. A B-tree index guarantees fast searches for the correct chunk number when doing random access reads and writes.

It’s possible that pg_largeobject table will hold some object data, but it might not be used anywhere in database. For now there are no build-in methods to remove this ghosts, as set forth in PostgreSQL help. But there is a thought that this may be done by VACUUM command. However, this functionality is not implemented yet.

Comparative table

Characteristic BYTEA OID
Max. allowed space 1 GB 2 GB
Data access As a whole Stream-style
Storage In defined table In pg_largeobject system table
Data manipulation Using SQL and escaping sequnces Only within transaction block by special functions
Loading Preload On demand

9 thoughts on “BYTEA vs OID (Large Objects)

  1. hi friends…
    can you help me how to display bytea or oid postgresql (images) type to crystal report, when i try it, CR just display byte string…

    and when i try blob type from mysql it work sucessfully…

    what different blob and byte (please tell me and send me the answer to agotekom@gmail.com)

    thanks before…

    Like

  2. Hey, when creating a new large object you get a OID back. My question is that since this is a 4 Byte unsigned int there will be 2,147,483,647 possible entries. This seems fine until you consider deleting objects over time – does the internal counter keep track of reusable OID’s and use them when it hits its upper limit?

    Like

      1. Yes, unsigned = x2.

        But, because this will be acting as a content server and will have many new/deleted entries I don’t want to be in a position 5 years from now where it has hit the max value but has free OID’s from deletes – any way to confirm that the internal counter will check for free OID’s? I guess I could run a loop… no documentation on this I suppose…

        Like

Leave a comment