I’ve been fiddling with SQLAlchemy and the declarative ORM, in Python. This post was based on using Python 2.7.8 with sqlalchemy 0.9.8 and MySQL 5.5.27, and it’s pretty dry.
Here’s a quick list of tips:
- You can’t define a String type without a length if you plan to use MySQL. Use text, define a big varchar value (See generic mysql tips below!)
- You should use the MySQL dialect’s integer definition, which allows you to define a field as being unsigned. If the storage engine doesn’t support such a differentiation, then it will ignore that specification and not fail (so far in testing).
- If you do use the MySQL dialect for your Integers, do it EVERYWHERE. Especially when you are going to link tables with foreign keys, as you’ll get random error 150s when creating your tables otherwise.
- You can use paramiko to tunnel your MySQL connection over SSH. If there’s interest, I might post on this, because I’ve done a bit of testing.
- Don’t directly access “secure” data over MySQL, unless you tunnel or your server supports SSL certs (and you’ve confirmed you are using them).
- You don’t have to create a backref, when you create a related table. There’s many cases where you might not want to, as if someone in the future ignorantly polls such a list they might cause performance issues. Depending on your environment, this might be solved by documentation. But if you’d rather implement your own search function with limits of request sizes or similar, then you may as well leave this off.
- Learn when to lazyload (the default) and when to be more eager. Rule of thumb is that lazyload reduces overall load, and might instantiate your object quicker, but your overall request time will be lower if you do it all in one hit. If you nearly always load the child items data when you load the parent, you may as well be eager.
- You can choose how eager your requests are on a relationship basis, OR on a request basis.
- If you are going to do a big, important, complicated query for reporting or something. Consider hand-crafting it, or budget to spend a lot of time optimising.
- Bear in mind your reporting requirements when you design the database from the outset!
- Memcached (or any other KV store) can be REALLY useful. For some objects, it doesn’t matter if an old one is used for a while, or alternatively it’s easy for any process which changes it to wipe it from the KV store on write. If this data is also the result of complicated joins and/or subqueries, then this is EXTRA important.
- If you can’t use memcached or a KV store, you can make your own filthy little store to local files. It might not be much faster than MySQL, but it might drastically drop load and free cache on your database server.
Generic MYSQL tips:
- There are two magic number in MySQL with varchar. Anything that can be given a size limitation of 255 should be given such a limit. Anything bigger may as well have a maximum size of 65535. Anything bigger than that can’t use varchar, so try something related to TEXT. Below 256 bytes of data, varchar uses one byte to reference the length of the entry. Above that it uses two. Only use CHAR instead of VARCHAR if you read all the documentation, and know the intricacies of the difference. Typically if you have fixed width data, in your tables.
- TEXT type and BLOB types are stored outside of the table structure, and aren’t massively awesome performance wise. Don’t assign everything as TEXT type so you don’t have to think about the size of the data you want to store.
Got any other tips, tricks or especially traps to avoid? Hit up the comments!