MySQL 5.6 through the eyes of a custom storage engine MySQL plugin
MySQL is famous for its pluggable storage engine architecture which allows a DBA or an application developer to choose the right engine for the task. An application uses MySQL API and is isolated from all of the low-level implementation details at the storage level. As an example, the Cloud Storage Engine (ClouSE) enables existing MySQL applications to use cloud storage such as Amazon S3 or Google Cloud Storage to store its data. The application doesn’t need to be changed or even redeployed: with ClouSE, remote cloud storage will look like a better (ultra-scalable, durable, always-on) alternative to the local storage.
As you may already know, ClouSE now supports MySQL 5.6 release series. See this announcement for more detail. Let’s go through the set of changes that were required on the ClouSE side in order to keep up with core MySQL 5.6 changes.
We had to adapt our code to compile and work with MySQL 5.6 while keeping 100% compatibility with MySQL 5.5. As much as we could, we tried to fix the code in a way that would work with both release series, but there are cases where the code has to be conditionally compiled for each release series.
Here is the list of MySQL 5.6 breaking changes and our solutions, in no particular order.
Advanced Weblob operations help to use Weblobs most effectively.
In an earlier post I introduced Weblobs. Weblob is a new data type that is supported by the Cloud Storage Engine for MySQL (ClouSE). To a database developer, a WEBLOB behaves (almost) like a regular BLOB. However, in addition to the regular BLOB functionality, Weblobs can be downloaded directly from Amazon S3 by HTTP URLs.
In MySQL, a Weblob is expressed via a pair of BLOB fields that have a special naming convention: field_name$wblob and field_name$wblob_info. The latter field is what provides the Weblob functionality. It can be used to retrieve the direct Amazon S3 URL for the BLOB content.
But why does the field_name$wblob_info field have the $wblob_info suffix and not $wblob_url suffix? Can it do more than just retrieve direct URLs? It actually can.
I got a few questions like the ones below that I’d like to address to avoid further confusion.
How exactly secure is ClouSE for MySQL, the first secure database in the cloud? Am I protected against standard application level security attacks or even accidental admin mistakes?
With the help of ClouSE I get instantaneous backup for my database on the highly durable cloud storage. But how would I protect my data in case a malicious attack or an accident did occur?
I’ve got a comment pointing out that data encryption on the storage level doesn’t protect from SQL injections. Of course, data encryption does not protect from SQL injections (as long as there is SQL involved, there will be a risk of a SQL injection). Neither does it protect from the infinite number of attack vectors that can happen at any layer of the application stack: PHP, Apache, MySQL, Linux, application code, application users, etc.
Can OLTP database workloads use Amazon S3 as primary storage? Now they can, thanks to the Cloud Storage Engine (ClouSE), but the question is: how fast?
Cloud-powered BLOB type provides ACID guarantees and fast direct access to blobs via Web URLs.
Typically unstructured data (such as pictures, media files, documents)
a) Is either stored on the file system, unlike the related with it relational data which is stored in the database. This is well known, “convenient” practice that allows fast access to files but offers no transactional story and no unified data management (for db and filesystem)
b) Or is stored in BLOBs. This ensures transactional consistency and reduces management complexities, but is really bad for performance and scalability.
We took advantage of the cloud, and came up with an upgrade to the BLOB – a solution that combines the benefits of the two.
My response to Database Innovation, pleeease!
Sure :-) We’ve just recently released a Beta of ClouSE — the Cloud Storage Engine for MySQL that provides fully functional relational data management on top of Amazon S3.
Even though we still use the good ol’ B-trees (sorry), dealing with remote eventually consistent elastic storage provided plenty of innovation opportunities. We had to rework the ARIES algorithms that don’t really account for pages being physically deleted (traditionally they just go to a free list so the storage never shrinks); neither do they account for eventual consistency – to implement ACID the whole storage engine stack from buffer manager, to log manager, to transaction manager, to access methods had to go beyond gradpa’s algorithms and protocols.