you're reading...

Founders BLOG

MySQL BLOB meets Amazon S3: advanced Weblob features

Advanced Weblob operations help to use Weblobs most effectively.

Weblob recap

In an earlier post I introduced Weblobs.  Weblob is a new data type that is supported by the Cloud Storage Engine for MySQL (ClouSE).  To a database developer, a WEBLOB behaves (almost) like a regular BLOB.  However, in addition to the regular BLOB functionality, Weblobs can be downloaded directly from Amazon S3 by HTTP URLs.

In MySQL, a Weblob is expressed via a pair of BLOB fields that have a special naming convention: field_name$wblob and field_name$wblob_info.  The latter field is what provides the Weblob functionality.  It can be used to retrieve the direct Amazon S3 URL for the BLOB content.

But why does the field_name$wblob_info field have the $wblob_info suffix and not $wblob_url suffix?  Can it do more than just retrieve direct URLs?  It actually can.

On SELECT, the field_name$wblob_info field can be used to retrieve the URLs.  On INSERT / UPDATE, the field_name$wblob_info field can accept Weblob control commands.  The Weblob control commands can be used to specify the Weblob name (and content type), and implement content streaming for very large Weblobs.

Weblob names (and content type)

The name control command can be used to specify a name for the Weblob URL.  Using good image names is important for Search Engine Optimization (SEO), and the name control command is there to help with that.

The name also determines the content type (a.k.a. MIME type) associated with the Weblob content.  ClouSE has an extension-to-type map, similar to ones that http servers have.  The map associates the extension (the part of the name after the last dot, e.g. pic.gif would have gif extension) with the content type.  For example, the gif extension is associated with the image/gif content type.  If the extension is not known or there is no extension, the content is going to have the application/octet-stream content type.

Let’s see how to give Weblob a name.  First, we need to create a table that contains a Weblob.  The following example shows how to create the media table that is keyed by an id and has a data Weblob:

mysql> CREATE TABLE media (id BIGINT KEY, data$wblob LONGBLOB, data$wblob_info BLOB)
       ENGINE=CLOUSE;

Now we can insert some text into a Weblob and give it the foo.txt name, for example:

mysql> INSERT INTO media VALUES (42, '<i>just</i> text', 'name:foo.txt');

This statement inserts a record with id=42, ‘<i>just</i> text’ content, and Weblob name foo.txt.  Now we can get the Weblob URL and see that it indeed has the foo.txt name:

mysql> SELECT data$wblob_info FROM media WHERE id=42;
+---------------------------------------------------------------------------+
| data$wblob_info                                                           |
+---------------------------------------------------------------------------+
| https://s3.amazonaws.com/oblaksoft-yapixx/db0/wblob/17B45868B514F8/3/2A/  |
| 29GVh96fxjStXebdbpMqs3jbD5aOlGEFQgQcspJnANo/foo.txt                       |
+---------------------------------------------------------------------------+

The URL now can be used to retrieve the content directly from Amazon S3.  Let’s copy and paste the URL into a Web browser:

Note that the Web browser interpreted foo.txt as a text file – the HTML tags are not processed.  This is because the Weblob name foo.txt implies text/plain content type.

Weblob upload streaming

Unstructured data can be quite large.  MySQL (as well as RDBMS in general) is not designed to handle big data objects.  By default, MySQL server doesn’t allow uploading objects that are larger than 1 MB; the limit can be increased up to 1 GB.  Even with the limit increased, uploading big data objects presents a scalability problem because MySQL server holds the entire objects in memory.

The streaming control command solves the upload scalability problem and removes the limits.  With streaming, large Weblobs can be uploaded in small chunks, without posing scalability problems.  Under the covers, ClouSE uses the multipart upload feature of Amazon S3 to control memory usage and improve reliability of uploads.

Let’s see how to stream a Weblob into Amazon S3.  The streaming control command takes one of three arguments: first, next, or last.  The argument defines where the updated chunk belongs to in the Weblob: the beginning, the middle, or the end.  The first command can be combined with the name command, which is what we do here:

mysql> START TRANSACTION;
mysql> INSERT INTO media VALUES (33, 'Not <i>just</i> text', 'streaming:first;name:bar.htm');
mysql> UPDATE media SET data$wblob='<br>2nd line', data$wblob_info='streaming:next' where id=33;
mysql> UPDATE media SET data$wblob='<br>3rd line', data$wblob_info='streaming:next' where id=33;
mysql> UPDATE media SET data$wblob='<br>last line', data$wblob_info='streaming:last' where id=33;
mysql> COMMIT;

Here we started a transaction (streaming must happen within a transaction) and uploaded a Weblob with id=33 in four pieces using the streaming control command.  Now let’s get the URL and paste it into a Web browser:

mysql> SELECT data$wblob_info FROM media WHERE id=33;
+---------------------------------------------------------------------------+
| data$wblob_info                                                           |
+---------------------------------------------------------------------------+
| https://s3.amazonaws.com/oblaksoft-yapixx/db0/wblob/17B45868B514F8/5/2D/  |
| e4N7AzJq8cJcDl5-TkE_FTVEjLc-vIxcaBrhVkU4ey8/bar.htm                       |
+---------------------------------------------------------------------------+

The Web browser indeed shows the full Weblob content, assembled from the pieces.  Note that the content is interpreted as HTML.  This is because the Weblob name is bar.htm which implies text/html content type.

In conclusion

New Weblob data type was introduced in MySQL + ClouSE to simplify storage and management of unstructured data. The Weblob stores BLOB data (example: documents, image files, music and videos) in the cloud storage and makes this data accessible via a direct cloud URL. Weblobs provide a simple and powerful model for a relational database developer to leverage the cloud to scale out content delivery. Weblob control commands remove scalability limits and simplify Weblob usage in Web applications. The power of cloud storage can now be controlled with simple SQL statements.

For real-life working example of using advanced Weblob features, please see the WP2Cloud WordPress plugin. This open source plugin ensures that media files that are uploaded to the WordPress are actually uploaded to the cloud storage, not to the local file system of the Web server hosting this website.

I hope you’ll take advantage of MySQL features unique to the cloud storage, and look forward to hearing your Weblob success stories!

See also

MySQL BLOB meets Amazon S3: Weblobs explained

Discussion

Trackbacks/Pingbacks

  1. [...] See the rest here: MySQL BLOB meets Amazon S3: advanced Weblob features … [...]

Post a Comment

Categories

Twitting ...