<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>select * &#187; Uncategorized</title>
	<atom:link href="http://aptoma.com/select.star/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://aptoma.com/select.star</link>
	<description>web-development, and other issues we really, really care about</description>
	<lastBuildDate>Thu, 22 Apr 2010 19:53:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Caching Improvements</title>
		<link>http://aptoma.com/select.star/2009/11/11/caching-improvements/</link>
		<comments>http://aptoma.com/select.star/2009/11/11/caching-improvements/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 11:36:14 +0000</pubDate>
		<dc:creator>Geir Berset</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://aptoma.com/select.star/?p=857</guid>
		<description><![CDATA[We are continuing our caching discussions from the past few weeks :

7 Approved Caching Technologies
Caching Strategies

Our goal in the previous few weeks has been to identify caching strategies and technologies. We have now used this knowledge to identify where to invest our focus on improvements. Our conclusion is to go fiercely ahead investigating the following [...]]]></description>
			<content:encoded><![CDATA[<p>We are continuing our caching discussions from the past few weeks :</p>
<ol>
<li><a href="http://aptoma.com/select.star/2009/11/05/7-approved-caching-technologies/">7 Approved Caching Technologies</a></li>
<li><a href="http://aptoma.com/select.star/2009/10/22/caching-strategies/">Caching Strategies</a></li>
</ol>
<p>Our goal in the previous few weeks has been to identify caching strategies and technologies. We have now used this knowledge to identify where to invest our focus on improvements. Our conclusion is to go fiercely ahead investigating the following couple of topics further.</p>
<h3><strong>1. Reverse proxy caching (Varnish and ESI</strong>)</h3>
<p>Håkon and Michael will be putting in the effort to research what is possible to do with ESI and VCL.  By getting an overview on how we best can utilize Varnish in our applications we can either learn how to configure Varnish ourselves, or inform hosting providers of concrete scenarios which then they can configure for us. It&#8217;s worth noting that Varnish has been performing flawlessly on almost every installation we have had to date, so this is an important tool. Our hope is to develop some sensible and helpful best practices for using Varnish with and without ESI.</p>
<h3><strong>2. Data caching</strong></h3>
<p>Lars and Stefan will be looking into how we can improve our very own AFWCache-mechanisms (AFW is our inhouse, still unreleased framework). We might include MySQL / SQLite data-caching options, as well. Another core topic for them to explore further is whether we&#8217;ll be extending the use of APC to also include data caching (as opposed to only opcode caching). In order to decide to use APC for data caching on high performance production environments, we have to learn more about it&#8217;s behavior under duress, and we need to know if we can make it degrade gracefully (i.e what happens if APC runs out of memory?)</p>
<h3>We will currently not invest more energy in</h3>
<p><strong>View and subview caching</strong> (application caching) which basically is the slower sibling of Varnish with and without ESI. Although more flexible (you can process cookies), we choose to discard any effort in improving in this area to keep our focus where it need be for the moment. Also our current support for view and subview caching through data caching is performing just fine.</p>
<p><strong>Query  Cache &#8211; </strong>We are requiring query cache to be set to &#8220;ON&#8221; on every installation. Our conclusion is that this is more than enough for our current needs. We&#8217;ll be revisiting the topic of setting query cache to DEMAND and using the /*SQL_CACHE*/ trigger when we are making the new Twitter, or something with a similar requirement for database scaling.</p>
<p><strong>Client caching</strong> &#8211; We will be refining our guidelines and best practices for setting the correct headers, but we will not invest in the topic of Local Storage (&#8220;the new cookie&#8221;) as of yet. We&#8217;ll wait until the browser support is broader. Local storage is expected to save us a lot of AJAX-callbacks in the future. More on this sometime in the future.</p>
<p><strong>Opcode cache</strong> &#8211; Having discarded a couple of other candidates, we require APC for PHP on all our installations. Once up and running, it just works without any intervention, so we&#8217;ll not be investing any more energy into the topic of opcode caching.  We have strengthened our systems setup testing for APC, and we&#8217;ll leave it at that.</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2009/11/11/caching-improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VG Meta</title>
		<link>http://aptoma.com/select.star/2009/03/15/vg-meta/</link>
		<comments>http://aptoma.com/select.star/2009/03/15/vg-meta/#comments</comments>
		<pubDate>Sun, 15 Mar 2009 12:01:03 +0000</pubDate>
		<dc:creator>Geir Berset</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://aptoma.com/select.star/?p=600</guid>
		<description><![CDATA[Our good friends at VG Nett (VG Nett is a customer of ours, and the largest online media in Norway) just started their own blog &#8211; &#8220;VG Meta&#8220;. The blog is their channel for sharing what they do and how they do it. We look forward to follow their posts.

Have a look at VG Meta [...]]]></description>
			<content:encoded><![CDATA[<p>Our good friends at VG Nett (<a href="http://www.vg.no/">VG Nett</a> is a customer of ours, and the largest online media in Norway) just started their own blog &#8211; &#8220;<a href="http://meta.vgb.no/">VG Meta</a>&#8220;. The blog is their channel for sharing what they do and how they do it. We look forward to follow their posts.</p>
<p style="text-align: center;"><a href="http://meta.vgb.no/"><img class="size-full wp-image-601 aligncenter" style="border:0" title="picture-6" src="http://aptoma.com/select.star/wp-content/uploads/2009/03/picture-6.png" alt="picture-6" width="377" height="184" /></a></p>
<p style="text-align: center;"><a href="http://meta.vgb.no/">Have a look at VG Meta</a> (Norwegian only)</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2009/03/15/vg-meta/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coffee Consumption Graph (CCG)</title>
		<link>http://aptoma.com/select.star/2009/03/06/coffee-consumption-graph-ccg/</link>
		<comments>http://aptoma.com/select.star/2009/03/06/coffee-consumption-graph-ccg/#comments</comments>
		<pubDate>Fri, 06 Mar 2009 10:00:15 +0000</pubDate>
		<dc:creator>Geir Berset</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[coffee]]></category>
		<category><![CDATA[feedback]]></category>

		<guid isPermaLink="false">http://aptoma.com/select.star/?p=579</guid>
		<description><![CDATA[We&#8217;re no better then the rest. We drink loads of coffee, and we like it. We even measure consumption, cup by cup, and brag about it on our blog.

(Bookmark this page, the graph updates during the day!)
During the weekends the graph is normally flat, so check back in during weekdays. First cup is usually around [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re no better then the rest. We drink loads of coffee, and we like it. We even measure consumption, cup by cup, and brag about it on our blog.</p>
<p><img src="http://test01.aptoma.no/aptoma01.aptoma.no-coffee-day.png" alt="Coffee consumption" /><br />
(Bookmark this page, the graph updates during the day!)</p>
<p>During the weekends the graph is normally flat, so check back in during weekdays. First cup is usually around 7 or 8 am.</p>
<h3>Edit: More graphs are now available</h3>
<h4>Week</h4>
<p><img src="http://test01.aptoma.no/aptoma01.aptoma.no-coffee-week.png" alt="" /></p>
<h4>Month</h4>
<p><img src="http://test01.aptoma.no/aptoma01.aptoma.no-coffee-month.png" alt="" /></p>
<h4>Year</h4>
<p><img src="http://test01.aptoma.no/aptoma01.aptoma.no-coffee-year.png" alt="" /></p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2009/03/06/coffee-consumption-graph-ccg/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Happy Birthday Rick Astley!</title>
		<link>http://aptoma.com/select.star/2009/02/06/happy-birthday-rick-astley/</link>
		<comments>http://aptoma.com/select.star/2009/02/06/happy-birthday-rick-astley/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 11:03:36 +0000</pubDate>
		<dc:creator>Håkon Drange</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[rick]]></category>

		<guid isPermaLink="false">http://aptoma.com/select.star/?p=539</guid>
		<description><![CDATA[February 6th is an important day. More precisely, it is the birthday of Rick Astley.
We wanted to honor Rick on his special day, so we rasterbated him up on our office wall. If you drop by to celebrate with us we&#8217;ll hook you up with a beautiful song and a beer.

]]></description>
			<content:encoded><![CDATA[<p>February 6th is an important day. More precisely, it is the birthday of <a href="http://en.wikipedia.org/wiki/Rick_astley">Rick Astley</a>.</p>
<p>We wanted to honor Rick on his special day, so we <a href="http://homokaasu.org/rasterbator/">rasterbated</a> him up on our office wall. If you drop by to celebrate with us we&#8217;ll hook you up with a <a href="http://www.youtube.com/watch?v=Yu_moia-oVI">beautiful song</a> and a beer.</p>
<p><a href="http://aptoma.com/select.star/wp-content/uploads/2009/02/rick_aptoma.jpg"><img class="alignnone size-full wp-image-540" title="rick_aptoma" src="http://aptoma.com/select.star/wp-content/uploads/2009/02/rick_aptoma.jpg" alt="" width="500" height="749" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2009/02/06/happy-birthday-rick-astley/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Live streaming the Christmas Party @ Aptoma</title>
		<link>http://aptoma.com/select.star/2008/12/12/live-streaming-the-christmas-party-aptoma/</link>
		<comments>http://aptoma.com/select.star/2008/12/12/live-streaming-the-christmas-party-aptoma/#comments</comments>
		<pubDate>Fri, 12 Dec 2008 13:55:20 +0000</pubDate>
		<dc:creator>Geir Berset</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://aptoma.com/select.star/?p=471</guid>
		<description><![CDATA[We&#8217;re having a Christmas party today, and we&#8217;re having it in our office. We figure that the only thing sadder than having the Christmas party at the office, would be to provide a live-stream from the party itself.
So that&#8217;s what we are were doing.
We&#8217;re even changing our home-page to reflect this. Visit http://aptoma.com/index_jul.php to experience [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re having a Christmas party today, and we&#8217;re having it in our office. We figure that the only thing sadder than having the Christmas party at the office, would be to provide a live-stream from the party itself.</p>
<p>So <a href="http://aptoma.com/index_jul.php">that&#8217;s what we <span style="text-decoration: line-through;">are</span> were doing</a>.</p>
<p>We&#8217;re even changing our home-page to reflect this. Visit <a href="http://aptoma.com/index_jul.php">http://aptoma.com/index_jul.php</a> to experience true Christmas spirits in Aptoma. Don&#8217;t mind the wine and beer.</p>
<p>Merry Christmas.</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/12/12/live-streaming-the-christmas-party-aptoma/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL &#8211; Configuration, query cache and other thingamajigs (part 2 of 2)</title>
		<link>http://aptoma.com/select.star/2008/07/01/mysql-configuration-query-cache-and-other-thingamajigs-part-2-of-2/</link>
		<comments>http://aptoma.com/select.star/2008/07/01/mysql-configuration-query-cache-and-other-thingamajigs-part-2-of-2/#comments</comments>
		<pubDate>Tue, 01 Jul 2008 06:00:46 +0000</pubDate>
		<dc:creator>Lars Hetland</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://aptoma.no/select.star/?p=15</guid>
		<description><![CDATA[This is the third post in a short series about MySQL. Read the first here, and the second here. This one covers this and that and has no set topic. The post is quite long, so I&#8217;ve split it in two (read the first part here). This also hopefully make you add us to your [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is the third post in a short series about MySQL. Read <a href="http://aptoma.no/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/">the first here</a>, and <a href="http://aptoma.no/select.star/2008/06/26/mysql-indexes-to-mysql-what-green-kryptonite-is-to-smallville-universe-bizarro/">the second here</a>. This one covers this and that and has no set topic. The post is quite long, so I&#8217;ve split it in two (<a href="http://aptoma.no/select.star/2008/06/27/mysql-configuration-query-cache-and-other-thingamajigs-part-1-of-2/">read the first part here</a>). This also hopefully make you <a href="http://feeds.feedburner.com/aptoma/selectstar">add us to your rss-reader</a>, and keep you coming back for more.</em></p>
<h3>Nested updates</h3>
<p>When INSERTing, REPLACEing, UPDATEing or DELETEing several rows, nesting them together in a single long query is faster, preferably with INSERT DELAYED where applicable. The nested queries are faster in them self since the query is only interpreted once and only needs to update indexes once whereas several single rows inserted require one update each. When considering this remember that on a MyISAM table with a more even read/write load, the higher overhead of multiple fast queries might be better than a rare longer lasting table lock. UPDATEs can&#8217;t be nested, but can be written as a temporary INSERT followed by an UPDATE.</p>
<pre>/* CREATE TABLE */
CREATE TABLE associates ( id INT, value INT, name VARCHAR(10) );
CREATE TABLE tmp_associates ( id INT, value INT, name VARCHAR(10) );</pre>
<pre>/* INSERT */
INSERT DELAYED INTO associates SET ( value, name ) VALUES
		( 4, 'Mr Pink' ), ( 2, 'Mr White' ),
		( 4, 'Mr Orange' ), ( 6, 'Mr Brown' );</pre>
<pre>/* UPDATE */
TRUNCATE TABLE tmp_associates;
INSERT INTO tmp_associates VALUES ( 3000, 'Mr Pink' ),
		( 22, 'Mr White' ), ( 11, 'Mr Orange' ),
		( 91, 'Mr Brown' );
UPDATE associates, tmp_associates
		SET associates.value = tmp_associates.value
		WHERE tmp_associates.name = associates.name;</pre>
<h3>Subselects</h3>
<p>In most cases, don&#8217;t use them, use JOINs. The foreachesque way of thinking that leads to subselects is logical and tempting to use, but as most RDBs work in sets, they are in most cases dead slow compared to JOINs. It should be noted that in some cases subselects can be hundred of times faster, but those cases are special and you should really know what you are doing before opting for it. Of all the JOINs (inner, outer, left, right, cross and more) STRAIGHT_JOIN is one you shouldn&#8217;t use. STRAIGHT_JOIN tells the database to always join in the order that you specify and to not use its optimizer. Have we learned anything from Hollywood it&#8217;s that computers (and dirty apes) are smarter than humans, don&#8217;t flatter yourself by thinking otherwise. :)</p>
<h3>Boolean flags</h3>
<p>Apparently <a href="http://aptoma.no/vi/#dude-5">Torfinn here at Aptoma</a> did research into this some time ago and discovered that NULL values are much slower compared to the number zero and allows nicer query-syntax. In MySQL TINYINT, BOOL and BOOLEAN are all synonyms for the same single-byte data type so it doesn&#8217;t matter which one you choose, but TINYINT with the values 1 and 0 are more logical and easier to understand for someone who doesn&#8217;t know this. The CREATE string for avoiding NULL values is like this:</p>
<pre>CREATE TABLE t ( deleted TINYINT NOT NULL DEFAULT 0 );</pre>
<p>The conditional part of a query can then be written like this:</p>
<pre>SELECT id FROM t WHERE !deleted;</pre>
<h3>Temp tables</h3>
<p>If you feel the write rate of your table is too high for optimal query cache usage and you don&#8217;t have any rush inserting new rows, consider inserting them into a temp table and at set intervals INSERT INTO the main table. This leads to faster initial INSERTs, potentially more query cache hits and less overhead updating indexes.</p>
<h3>Do more once, not some foreach</h3>
<p>In normal programming languages foreach is great, but for SQL it&#8217;s seldom optimal. If you end up doing a query foreach something, look into doing two foreachs in PHP and one larger SQL-query. One foreach to gather conditional data (build a string suitable for an IN condition) and one after the query is executed where you first locate the row(s) you need. Ordering the data in the same order in both PHP and SQL will save you lots power as you can read and match rows from the returned data set sequentially.</p>
<p>I&#8217;ll provide an example. The previous paragraph could be a bit hard to grasp. In this first function a date restricted query is executed foreach user looking for its name. In the second function all user names are added to an IN() function and a single query gathers the data for all listed users within the same date restriction and is ordered by user name. Then some PHP orders the users in the same order ( hopefully &#8212; be careful about collation and special chars) and foreach user locate matching row in the result set.</p>
<p>The bad, with potentially a bunch of queries:</p>
<pre>function weekReport() {
	$lusers = array( 'Joe', 'Frank', 'Bill', 'Ted' );
	$errors = array();
	$today = date( 'Y-m-d 00:00:00' );
	$sql = "SELECT SUM( errors ) AS errors FROM problems WHERE
	created &lt; '{$today}'
		AND CREATED &gt;=
		DATE_SUB( '{$today}', INTERVAL 7 DAY ) AND name = ";
	foreach ( $lusers as $luser ) {
		$result = mysql_query( $sql . "'{$luser}';";
		if ( $row = mysql_fetch_assoc( $result ) ) {
			$errors[$luser] = $row['errors'];
		}
	}
	return $errors;
}</pre>
<p>The potentially good way, with one query:</p>
<pre>function weekReport() {
	$lusers = array( 'Joe', 'Frank', 'Bill', 'Ted' );
	$errors = array();
	$today = date( 'Y-m-d 00:00:00' );
	$sql = "SELECT name, SUM( errors ) AS errors
		FROM problems WHERE
		created &lt;  '{$today}'
		AND CREATED &gt;=
		DATE_SUB( '{$today}', INTERVAL 7 DAY )
		AND name
		IN ( '" .  implode( '\',\'', $lusers ) . '\'' . ' )
		GROUP BY name, ORDER BY name DESC;';
	$result = mysql_query( $sql );
	$lusers = array_sort( $lusers );
	$found = true;
	$row = false;
	foreach ( $lusers as $luser ) {
		if ( $found ) {
			$found = false;
			$row = mysql_fetch_assoc( $result );
		}
		if ( $row &amp;&amp; $row['name'] == $luser ) {
			$found = true;
			$errors[$luser] = $row['errors'];
		}
	}
	return $errors;
}</pre>
<h3>Configuration</h3>
<p>The default cache sizes aren&#8217;t very performance tuned so you probably want to increase them if you have the resources and access to do so.</p>
<p>SHOW VARIABLES;</p>
<p>will show current MySQL settings and these are some of the more important ones:</p>
<p><strong>When using InnoDB</strong><br />
innodb_buffer_pool_size &#8211; Total cache size for InnoDB databases. Default 8 MB and can be set to up to 50-80% of total system memory when only using InnoDB tables.<br />
innodb_log_buffer_size &#8211; How much log to buffer in memory before writing to disk. Increasing this could give a performance boost.</p>
<p><strong>When using MyISAM</strong><br />
key_buffer_size &#8211; Total cache size for MyISAM databases. Default 8 MB and can be set to up to 50-80% of total system memory when only using MyISAM tables.</p>
<p><strong>Query Cache</strong><br />
query_cache_size &#8211; Total cache size for query results. Must be set to more than 0 byte and a few megabyte should be enough in most cases.<br />
query_cache_type &#8211; Query Cache status. 0 or OFF is deactivated, 1 or ON is always on and 2 or DEMAND is on when using key word SQL_CACHE. ON is the easiest as no modification to the SQL statements are required but will decrease performance on cache misses. DEMAND won&#8217;t have the same performance hit on all queries and could be the best choice when all programmers know each tables access levels and know when Query Cache will be needed and when not.</p>
<p><strong>When using MEMORY</strong><br />
max_heap_table_size &#8211; Total data size for MEMORY tables.</p>
<h3>Helpful commands</h3>
<p>Here in the end I will just list a few commands you should know. Search <a href="http://dev.mysql.com/doc/refman/5.0/en/">the MySQL manual</a> to see what they do.</p>
<p>SHOW VARIABLES;<br />
SHOW PROCESSLIST;<br />
SHOW FULL PROCESSLIST;<br />
SHOW STATUS;<br />
FLUSH STATUS;<br />
SHOW INNODB STATUS;<br />
SHOW TABLE STATUS table;<br />
SHOW CREATE TABLE table;<br />
SHOW INDEXES FROM table;<br />
SHOW KEYS FROM table;<br />
OPTIMIZE TABLE table;<br />
ANALYZE TABLE table;</p>
<h3>Presentations</h3>
<p>MySQL Performance  Blog from the authors of <a href="http://www.amazon.com/gp/product/0596101716">High Performance MySQL</a> has several good presentations worth reading. In some of them you need to read between the lines ( as they for the most part consist of bullet points ), but that should be easy enough.<br />
<a href="http://www.mysqlperformanceblog.com/mysql-performance-presentations/ ">http://www.mysqlperformanceblog.com/mysql-performance-presentations/ </a></p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/07/01/mysql-configuration-query-cache-and-other-thingamajigs-part-2-of-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL &#8211; Configuration, query cache and other thingamajigs (part 1 of 2)</title>
		<link>http://aptoma.com/select.star/2008/06/27/mysql-configuration-query-cache-and-other-thingamajigs-part-1-of-2/</link>
		<comments>http://aptoma.com/select.star/2008/06/27/mysql-configuration-query-cache-and-other-thingamajigs-part-1-of-2/#comments</comments>
		<pubDate>Fri, 27 Jun 2008 14:50:35 +0000</pubDate>
		<dc:creator>Lars Hetland</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://aptoma.no/select.star/?p=8</guid>
		<description><![CDATA[This is the third post in a short series about MySQL. Read the first here, and the second here. This one covers this and that and has no set topic. The post is a bit long, so I&#8217;ve split it in two. The next part will be released next week. Stay tuned with rss.
Natural keys [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is the third post in a short series about MySQL. Read <a href="http://aptoma.no/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/">the first here</a>, and <a href="http://aptoma.no/select.star/2008/06/26/mysql-indexes-to-mysql-what-green-kryptonite-is-to-smallville-universe-bizarro/">the second here</a>. This one covers this and that and has no set topic. The post is a bit long, so I&#8217;ve split it in two.</em><em> The next part will be released next week. <a href="http://feeds.feedburner.com/aptoma/selectstar">Stay tuned with rss.</a></em></p>
<h3>Natural keys versus surrogate keys</h3>
<p>Don&#8217;t always add an auto_increment integer as a key when a naturally unique field ( or a group of fields ) exists of reasonable size. The best example of this a join-table where you normally have two primary keys from some other tables where all SELECT queries will be conditional on them. When creating multi-field primary keys, you should also have in mind the order in which you add them. Have the field you want to do most queries on first so the index can be used. If both fields are equally important, also add an index on the other field.</p>
<h3>COUNT(*)</h3>
<p>When using MyISAM, COUNT(*) needs almost no processing as the primary key index knows exactly how many primary keys there are, and the query is very fast. But the index in an InnoDB table does not have this feature, (because of the Multi Version Concurrency Control, MVCC) and one of the way it finds out how many rows there are is to sequentially read them all, an operation potentially tens of thousand times slower. There are a few alternatives and the first you should do is to decide how exact you need the the number to be. The best solution when it has to be exact is to add ON INSERT and ON DELETE triggers to the table and increment/decrement an integer in some other table. That table could use the MEMORY engine for optimal speed. Other solutions could be to manually increment/decrement a value in Memcached on INSERT and DELETE queries or have a cron-job execute a COUNT(*) once in a while and cache the value. If one row is INSERTed per day or several rows are INSERTed once a day at a specific time, then there is no point in counting the rows more than than once a day, preferably right after the INSERTs have finished.<br />
Insted of</p>
<pre>SELECT COUNT(*) FROM wins WHERE player_id = 123;</pre>
<p>to find out how many wins a player has, why not just have &#8216;<em>wins</em>&#8216; as a field in the users table and increment when the player is winner?</p>
<h3>Don&#8217;t use functions on indexed fields</h3>
<p>When using indexes on a field its also important to not use SQL-functions on the field but list it unchanged alone on one side of the equality sign. In this example only the last will use the index on the field date:</p>
<pre>SELECT date FROM table WHERE UNIX_TIMESTAMP( date ) = 1057941242;
SELECT date FROM table WHERE date = FROM_UNIXTIME( 1057941242 );</pre>
<h3>UNIQUE and LIMIT 1</h3>
<p>When a field will be unique but is not suitable as primary key, UNIQUE could be a good choice for it. This is because UNIQUE is in fact treated the same way as the primary key and the optimizer will know that only one row will match. Both lookup times and memory allocation will probably be affected by this, and some of the same performance boost can be had by applying LIMIT 1 to queries where only one row is updated, such as:</p>
<pre>UPDATE users SET kickass = 1 WHERE
name = 'nameless space marine' LIMIT 1;</pre>
<h3>INSERT DELAYED</h3>
<p>MyISAM supports INSERT queries where index updating is held off until other queries with higher priority has completed. The performance boost is supposed to be significant in high load situations, but the chance of out-of-sync indexes increases where a slow REPAIR TABLE and an index rebuild could be needed after a system crash.<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html"> http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html</a></p>
<h3>Slim is fast</h3>
<p>Keep all fields as trim as possible. Use unsigned numbers to double the range when only positive numbers are needed and use TINYINT, SMALLINT, MEDIUMINT and limited VARCHARS as much as possible. You might need INT as AUTO_INCREMENT primary key on the users table, but an unsigned TINYINT ( 0-255 ) could be plenty for the categories table. Its a bit counter-intuitive, but splitting nullable-, text-, blob- and less accessed fields into a one-to-one sister table could give you a performance boost. This way you can faster and with less memory load more rows of the more accessed part of the table.<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/data-types.html"> http://dev.mysql.com/doc/refman/5.0/en/data-types.html</a></p>
<h3>Update internal statistics</h3>
<p>There are commands which will update cardinality and indexes in a table. For MyISAM the command is OPTIMIZE TABLE and for InnoDB it&#8217;s ANALYZE TABLE. Its worth mentioning that OPTIMIZE TABLE will issue a write lock on the table and block all other queries until its done.<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html">http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html</a><br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html">http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html</a></p>
<h3>mysqlhotcopy</h3>
<p>When doing a database dump on a large MyISAM table, consider using mysqlhotcopy instead of mysqldump.<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html">http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html</a></p>
<h3>Query cache</h3>
<p>By default MySQLs query cache is disabled and supports three modes: OFF, ON and DEMAND. OFF prevents caching or retrieval of any results, ON caches/retrieves as much as possible except statements that begin width SELECT SQL_NO_CACHE. DEMAND only caches queries starting with SELECT SQL_CACHE. Query cache is between 200 and (infinity+1)% faster on a cache hit, but any miss is normally 5-25% slower. However, both these numbers can vary a lot. On a table with a high number of reads with a high probability of identical queries, query cache will probably increase performance significantly, but since any INSERT, DELETE or UPDATE will invalidate all cache on that table, performance on a table with frequent writes is likely to drop. What&#8217;s also important is that the subsequent queries must be identical to the first for its cache to be used. That includes case, white space and brackets. When actively using query cache on tables with dates it&#8217;s also important to write reusable and cacheable queries. Any use of NOW() or equivalent non-deterministic functions will render query cache unused even within the same second, so date and time should be calculated outside of MySQL. When doing so, first analyze the need for an exact query. If you can get away with &lt; 1 minute accuracy, don&#8217;t add date( &#8216;Y-m-d H:i:s&#8217; ) but date( &#8216;Y-m-d H:i:00&#8242; ) so any identical queries within the same minute will get the cached result.<br />
<a href="http://dev.mysql.com/doc/refman/5.0/en/query-cache.html">http://dev.mysql.com/doc/refman/5.0/en/query-cache.html</a></p>
<h3>SELECT * <a href="http://aptoma.no/select.star/">© </a></h3>
<p>You don&#8217;t need everything so don&#8217;t ask for it! It will only slow down your query, increase memory usage while sorting and make the query less understandable when running &#8216;SHOW PROCESSLIST&#8217;.</p>
<h3>GROUP BY or DISTINCT</h3>
<p>Both these can be used with the same result but performance-wise, there is a difference. For simpler queries, DISTINCT is often faster but in more advanced queries GROUP BY could be the best choice. Benchmark your queries to find out which one to choose.</p>
<h3>IP addresses</h3>
<p>As we normally see IP addresses in the form of 000.000.000.000, VARCHAR( 15 ) might be natural to use, but you should treat it as a number because it will both take up less space and could be faster and easier to query. The data type of choice for an IPv4 address is the four byte unsigned INT. You then use INET_ATON and INET_NTOA to convert that number to the dotted-quad representation we normally use. (Or keep it as an int, it will always be superior to the string representation! )</p>
<pre>INSERT INTO blacklist SET ( ip, date ) VALUES
( INET_ATON( '209.207.224.40' ), NOW() );

SELECT date FROM blacklist WHERE ip = INET_ATON( '209.207.224.40' );
SELECT INET_NTOA( ip ) FROM blacklist WHERE ip &gt;= INET_ATON( '127.0' )
  AND ip &lt; INET_ATON( '128.1' );</pre>
<p><em>The next part will be released next week. <a href="http://feeds.feedburner.com/aptoma/selectstar">Stay tuned with rss.</a><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/06/27/mysql-configuration-query-cache-and-other-thingamajigs-part-1-of-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQL Indexes &#8212; to MySQL what green kryptonite is to Smallville universe Bizarro</title>
		<link>http://aptoma.com/select.star/2008/06/26/mysql-indexes-to-mysql-what-green-kryptonite-is-to-smallville-universe-bizarro/</link>
		<comments>http://aptoma.com/select.star/2008/06/26/mysql-indexes-to-mysql-what-green-kryptonite-is-to-smallville-universe-bizarro/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 10:19:43 +0000</pubDate>
		<dc:creator>Lars Hetland</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://aptoma.no/select.star/?p=7</guid>
		<description><![CDATA[This is the second post in a short series about MySQL and the first is located here. Today&#8217;s post stars indexes, those furry little creatures native to the forest moon of Endor.
Indexes are essential for fast and effective conditional queries and you should always use them. If you can afford the extra storage space, that [...]]]></description>
			<content:encoded><![CDATA[<p>This is the second post in a short series about MySQL and the first is <a href="http://aptoma.no/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/">located here</a>. Today&#8217;s post stars indexes, those furry little creatures native to the forest moon of Endor.</p>
<p>Indexes are essential for fast and effective conditional queries and you should always use them. If you can afford the extra storage space, that is. Selecting the right indexes is both a space vs. speed consideration, and the result of an analysis of the queries you will be doing. Therefore it shouldn&#8217;t be a set-and-forget operation but revised as queries are written and then again revised after live application operation has started, when real-world usage and data can be analyzed. Great tools for this are MySQL&#8217;s slow query log, the EXPLAIN command and query rate counters, all of which are commented in this post.</p>
<p>The different engines handle indexes in different ways. Thus indexes should be revised when switching engines. One serious issue is always there though: <strong>MySQL will only use one index in a query!</strong> (except when performing UNIONs) The results are that adding an index per field could be a huge waste of space and processing power if that index is never used. Another important note is that the order of the secondary index fields matters. If field one and three in an index is used in the query only field one will be index-searched because field three requires that field two is also used.</p>
<p><a href="http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html">http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html</a></p>
<h3>The grey smoke ( how they work )</h3>
<p><em>Warning</em> : The following section is a crude explanation of how indexes work. The <a href="http://aptoma.no/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/">different engines</a> have each their own way of implementing indexes. This section should only be read as a way they <em>could</em> work.</p>
<p>Indexes are ordered duplicates of selected fields from a table. Since these tables are ordered, MySQL can find matching rows in a WHERE clause much faster than in the random order of the original table. An index can also include several fields where the data is stored as a tree (in most cases) in the order you set the fields. The first field becomes the root node and following fields becomes the branches.Here is an example of a table with the fields id, first name and family name.</p>
<pre>1,	Alec,		Baldwin
2,	Adam,		Baldwin
3,	William,	Baldwin
4,	Stephen,	Baldwin
5,	Albert,		Einstein
6,	Adam,		West</pre>
<p>When executing the query :<br />
SELECT family FROM people WHERE first = &#8216;William&#8217;;<br />
MySQL would have to read all the first names to find the matches. But if you add the index ( <em>first</em> ), MySQL would create an index looking something like :</p>
<pre>Adam,		2
Adam, 		6
Albert,	 	5
Alec,		1
Stephen, 	4
William,	3</pre>
<p>The query would then by different kinds of magic move almost directly to the &#8216;William&#8217; section and find the references to the rows in the original table.</p>
<p>If we then add the index ( <em>family</em> ) this query would also go faster as the &#8220;<em>family</em>&#8220;-index would be used :</p>
<p>SELECT first FROM people WHERE family = &#8216;Einstein&#8217;;</p>
<p>But when executing a query with WHERE-clauses on both the family and first fields only <em>one</em> will be used and probably the one with highest cardinality (read about it further down). The following query would probably use the &#8220;first&#8221; index and then exclude &#8220;West&#8221; from the list after retrieving all the Adams from the original table.</p>
<p>SELECT first, family FROM people WHERE family = &#8216;Baldwin&#8217; AND first = &#8216;Adam&#8217;;</p>
<p>So what we want is an index with both fields like ( first, family ). MySQL will then create a tree in this shape:</p>
<pre>Adam,		Baldwin,	2
		West,		6
Albert,		Einstein,	5
Alec,		Baldwin, 	1
Stephen,	Baldwin,	4
William,	Baldwin,	3</pre>
<p>The previous query would then be able to exclude rows based on both conditions an find the matching table references. (You would also have a covering index. More on this later.)</p>
<p>A very important thing to know is the difference between ( <em>first</em>, <em>family</em> ) and ( <em>family</em>, <em>first</em> ) as the latter would look like this:</p>
<pre>Baldwin,	Adam,		2
 		Alec,		1
 		Stephen,	4
 		William,	3
 Einstein,	Albert,		5
 West,		Adam,		6</pre>
<p>Index prefix compression could also be mentioned here. Again the exact implementations could be very different from my version but you should be able to get the gist of it. When the start of multiple index fields match the engine could do some magic and only store that bit once. This can save a lot of space and adding fields with very low cardinality to an index could increase index size minimally, but not all engines implement them. Let&#8217;s add a few people and the field &#8220;<em>banana</em>&#8221; to our table and see how it could look.</p>
<p>First without compression:</p>
<pre>Baldwin,	Adam,		Yellow,		2
 		Adam,		Yellow,		9
 		Adam,		Brown,		10
 		Adaminium,	Yellow,		7
 		Adamsomething	Brown,		8
 		Alec,		Yellow,		1
 		Stephen,	Yellow,		4
 		William,	Yellow,		3
 Einstein,	Albert,		Yellow,		5
 West,		Adam,		Yellow,		6</pre>
<p>Then with compression:</p>
<pre>Baldwin,	(Adam),		(Yellow),	2
 		(),		(),		9
 		(),		Brown		10
 		()inium,	Yellow		7
 		()something	Brown		8
 		Alec,		(Yellow),	1
 		Stephen,	(),		4
 		William,	(),		3
 Einstein,	Albert,		(),		5
 West,		Adam,		(),		6</pre>
<p>I hope this explanation will help understanding why indexes are so usefull but still why a set-and-forget strategy seldom works. If 90% of your heavy queries can&#8217;t use your long and space consuming indexes then it&#8217;s just slowing you down.</p>
<h3>The Slow Query Log</h3>
<p>The variable <em>long_query_time</em> holds the number of seconds a query can be executed before its added to the slow log. Default value is 10 seconds. It&#8217;s recommended to use the mysqldumpslow to summarize the queries.</p>
<p><a href="http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html">http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html</a></p>
<h3>EXPLAIN</h3>
<p>Added in front of any query, a summary of MySQL&#8217;s optimization engine&#8217;s choices for the query are shown. This is an essential and simple way to check that your queries work with indexes as planned. The important fields are <em>type</em>, <em>key</em>, <em>keylen</em>, <em>rows</em> and <em>extra</em>. Type will tell you how the data is read where ALL is your mortal enemy and <em>const</em> and <em>range</em> your more likely targets. There are several others though, <span style="text-decoration: line-through;">google it</span> search them up with your favorite search engine. The <em>key</em> field tells you what key the optimizer chose, if any. On any large data set you would probably want an index here, but even if there are indexes available, MySQL might choose not to use them. If you see that MySQL does poor index choices on a query, USE INDEX or FORCE INDEX hints can be added to the query, but is not recommended for other than testing purposes as the data set is likely to change and the dynamic query path is safer in the long run. <em>keylen</em> tells you how much of the key is used in bytes. A small number such as 3 could indicate that just the first MEDIUMINT field in the index is used while 200 could be the result of a good multi field hit. (Unless your text indexes are unnecessarily long, read about partial indexes below). Rows is the number of hits the optimizer expects to find based on its table statistics. In the Extra field you should see &#8220;using index, using where&#8221; for a fast query but &#8220;using filesort&#8221; and a few other could mean problems. Again, <span style="text-decoration: line-through;">google it</span> what I wrote earlier.</p>
<p><a href="http://dev.mysql.com/doc/refman/5.0/en/explain.html">http://dev.mysql.com/doc/refman/5.0/en/explain.html</a></p>
<h3>Query rate counter</h3>
<p>D-I-Y by just incrementing an integer in Memcached or a MEMORY table every time your code does a database query or by adding a trigger. After some usage you can analyze the data and see which queries you should optimize for. Its really up to you how you choose to implement it.</p>
<p><a href="http://dev.mysql.com/doc/refman/5.0/en/triggers.html">http://dev.mysql.com/doc/refman/5.0/en/triggers.html</a></p>
<h3>Cardinality</h3>
<p>When doing a <em>SHOW INDEXES FROM table;</em> cardinality is one of the more important fields to inspect. Cardinality is the uniqueness of the index so far and the higher cardinality a field has, the better it will work as an index. Where the primary key has a cardinality equal to COUNT(*), the TINYINT(1) field deleted (boolean one or zero for true or false) would have a cardinality of maximum two, regardless of how many rows there are. A primary key lookup often goes at optimal speed but in the case of an index on the TINYINT(1) deleted field, won&#8217;t MySQL in most cases even use the index. This is because when the optimizer engine predicts a hit percentage of more than about 30% it will opt for the often faster sequential read of the entire data set. You should therefore order the fields in multiple field indexes first by field hit frequency (how often that field is part of the WHERE clause of your queries), and then by cardinality. This excludes as many rows as possible early in the query and to gives the optimizer an easier and more predictable job. Sometimes, however, even if the cardinality of a field is sub-optimal it&#8217;s better to include it in your index just so you can have a covering index.</p>
<h3>Covering index</h3>
<p>When processing an indexed query, MySQL first reads the index file, excludes as many rows as possible and then uses the byte offset (MyISAM) or primary key (InnoDB) to read the rest of the data needed in those rows from the data file. In some cases, the second file read could increase the latency more than if a larger data set was just read from the index. An index including all fields in a query where just one file read is needed is called a covering index. The decision to having covering indexes is then again a space/speed consideration.</p>
<h3>Partial indexes</h3>
<p>Normal indexes on text fields (TEXT, VARCHAR, etc) isn&#8217;t always a good choice because each distinct index in a VARCHAR(100) will occupy a lot of unnecessary space. With a partial index, only a set number of chars are indexed starting at the beginning of the field. The VARCHAR field for storing a persons first name should be wide enough for &#8216;Dominar Rygel XVI&#8217; to store his name, but an index of the first three to six chars is probably enough to get a good enough cardinality on most first names to <a href="http://www.gulesider.no/tk/search.c?q=domina&amp;x=0&amp;y=0">exclude &gt;95% of the rows</a>. How many chars are needed can easily be checked by just adding a new index and compare the listed cardinality after updating the table statistics. At 10 000 users from Scandinavia you might need five chars for a &gt;95% cardinality, but at 100 000 users from all over the world, four might result in the same cardinality with just marginally better result at five.</p>
<h3>Email domain queries with index</h3>
<p>In some situations you need to query email address by domain name (no you won&#8217;t, but you might want to learn this trick anyway) and instead of the slow LIKE %domain.com with a query type of ALL, the solution could be to reverse the address and add a partial index since indexes work from left to right. The address could be reversed in PHP with strrev() or by MySQL with REVERSE(), but a better solution could be to use the triggers BEFORE INSERT and BEFORE UPDATE, but that&#8217;s up to you. The query would then end up like this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> name <span style="color: #993333; font-weight: bold;">FROM</span> customers <span style="color: #993333; font-weight: bold;">WHERE</span> email <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'on.amotpa%'</span>;</pre></div></div>

<p>or</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> name <span style="color: #993333; font-weight: bold;">FROM</span> customers <span style="color: #993333; font-weight: bold;">WHERE</span> email <span style="color: #993333; font-weight: bold;">LIKE</span> REVERSE<span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'%aptoma.no'</span> <span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>and an insert could look like this</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> customers <span style="color: #66cc66;">&#40;</span> name<span style="color: #66cc66;">,</span> email <span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'Kryten'</span><span style="color: #66cc66;">,</span> REVERSE<span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'mechanoid4000@aptoma.no'</span> <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<h3>Realistic data set when testing</h3>
<p>One of the reasons why indexes should be revised over time is that MySQL uses dynamic execution paths based on statistics updated on data change. If you only have one row in a table, all the indexes in the world (except when querying with FORCE INDEX, but lets forget about that ancient religion) won&#8217;t make MySQL use them, because it knows that a sequential read of the entire datafile will be just as fast or faster, but with multi-million rows a sequential read (a query type ALL) is probably the last thing it wants to do. Cardinality is also important to have in mind when testing as indexes again won&#8217;t be used if your multi-million rows of test data are all identical. The solution is to either make a more advanced random data creator, manually add realistic data or export realistic data from some other live database.</p>
<h3>Moral of the story</h3>
<p>If you want the step up from slow to reasonable performance and beyond using MySQL, indexes are mandatory but be weary of their size. You might want to add indexes in all colors and shapes available but rembember that the data is duplicated for each index and when you INSERT, UPDATE or DELETE all indexes have to be updated as well as the main table. Make the index a short as you can while keeping the cardinality as high as possible, have multiple field indexes that match as many of your queries as possible ( rewrite the non index using queries with more constraints than needed if you have to/can ) and make sure the data types of your fields are as trim as possible. The last part will also come up in my next post. Stay tuned for more of my lies next week and please comment on any errors in the post or if you by some miracle learned anything from it. :)</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/06/26/mysql-indexes-to-mysql-what-green-kryptonite-is-to-smallville-universe-bizarro/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>MySQL &#8211; What Engine to Use AKA &#8216;in retrospect my first mistake&#8217;</title>
		<link>http://aptoma.com/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/</link>
		<comments>http://aptoma.com/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/#comments</comments>
		<pubDate>Fri, 13 Jun 2008 09:17:08 +0000</pubDate>
		<dc:creator>Lars Hetland</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://aptoma.no/select.star/?p=6</guid>
		<description><![CDATA[This is the first post in a short series on MySQL. In the end of this post I will try to list as many sources as I can remember to who and what told me these lies. Most of this will be copy-paste from our sekrit wiki and requires some knowledge of SQL, databases and [...]]]></description>
			<content:encoded><![CDATA[<p>This is the first post in a short series on MySQL. In the end of this post I will try to list as many sources as I can remember to who and what told me these lies. Most of this will be copy-paste from our sekrit wiki and requires some knowledge of SQL, databases and MySQL.</p>
<p>One of MySQL&#8217;s more powerful features is its support for multiple different database engines and the first thing you should do when creating a table is to select the storage engine best suitable for your needs. In MySQL 5.1 there are three engines worth mentioning, MyISAM, InnoDB and MEMORY but new interesting engines are in development such as Falcon, Maria, PBXT and solidDB.<br />
<a href="http://solutions.mysql.com/engines.html">http://solutions.mysql.com/engines.html</a></p>
<h3>MyISAM</h3>
<p>MyISAM is the default engine of MySQL and a good choice when your access levels are &gt;90% read or &gt;90% write on a high load system or in any case on a low-load system. Both SELECTs and INSERTs on MyISAM tables can be incredible fast, but where it falls through is how it handles locks. The problem is that table-level read/write locks are the only ones supported. This means that when a UPDATE or DELETE operation is executed all other queries are put on hold until completed, and when a INSERT operation is executed all other INSERTs are on hold. ( Note: this means that several read operations can be executed simultaneously. I&#8217;ll discuss more on INSERT DELAYED in a future post.) This could result in multi-second or even multi-minute delays on SELECTs when the SELECT/INSERT ratio is outside the aforementioned thresholds. MyISAM has fast imports and exports of backups and has mysqlhotcopy for faster dumps.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> MYISAM;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT<span style="color: #66cc66;">,</span> j INT<span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #66cc66;">&#40;</span> i <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">&#40;</span> j <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> MYISAM;</pre></div></div>

<p><strong>A thousand times faster full-text index.</strong> MyISAM handles indexes by storing the record byte offset with each index row. The result of this is that all index lookups happen at the same speed and that UPDATEs and INSERTs will not always result in an index update as the byte offset doesn&#8217;t always change. The overhead when adding several indexes is also kept at a minimum as each index just need to keep track of the byte offset. INSERT DELAYED will also delay updating indexes until other queries have finished. MyISAM also supports full-text index on VARCHAR, TEXT, etc where each word is stored in a B-tree. Full-text index does consume a lot of space, but a query such as this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> articles <span style="color: #993333; font-weight: bold;">WHERE</span> body <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">&quot;% database %&quot;</span>;</pre></div></div>

<p>might run thousands of times faster when a full-text index is added and the query is re-written as:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> articles <span style="color: #66cc66;">&#40;</span>body<span style="color: #66cc66;">&#41;</span> MATCH AGAINST <span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'database'</span><span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>Please note that MATCH AGAINST won&#8217;t display partial word-hits, e.g. &#8220;base&#8221; won&#8217;t locate &#8220;database&#8221;.</p>
<p><a href="http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html">http://dev.mysql.com/doc/refman/5.1/en/myisam-storage-engine.html</a></p>
<h3>InnoDB</h3>
<p>The InnoDB engine is an overall good engine that supports some more advanced features that you don&#8217;t get with MyISAM with row-level locks and transactions being the most important ones. The row-level lock means that a table can have multiple read and writes at the same time and because it supports Multi version concurrency control (MVCC) a row can be read while being written to. Where there is little system to the data structure in MyISAM, InnoDB orders it by primary key. The result is that for each INSERT, DELETE or UPDATE, rows need to be moved, row position recalculated and indexes potentially updated. This makes INSERTs into a InnoDB table slower than a MyISAM table, but since data is ordered by primary key, selecting based on it is extremely fast. Where MyISAM first reads primary key index file, finds byte offset for the data row and then reads the database file, InnoDB would do this in one step.</p>
<p>InnoDB handles indexes by storing the primary key clustered with all index records, something that could result in a large overhead. It&#8217;s therefore recommended to have a small integer or some other space-efficient data type as primary key and not a wide VARCHAR that might be 10-20 times the size.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> INNODB;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT<span style="color: #66cc66;">,</span> j INT<span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span> <span style="color: #66cc66;">&#40;</span> i <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">&#40;</span> j <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> INNODB;</pre></div></div>

<p><a href="http://www.innodb.com/innodb">http://www.innodb.com/innodb</a></p>
<h3>Memory</h3>
<p>A MEMORY table is in it self persistent, but its data is not, as it&#8217;s stored in memory. This makes it very fast, but prone to data loss. Obvious usage for a MEMORY table is to cache subsets of other tables ( SELECT INTO ) or processed subsets but still retain the power to do SQL queries for even more limited subsets with increased speed. Compared to Memcached this is in some cases far superior in simplicity and probably also in speed. A MEMORY table can of course also be used to store a serialized object, but unless the power of a conditional query is needed, Memcached is probably better for the job. The size of all MEMORY tables are governed by max_heap_table_size with a default value of 16 MB. Worth mentioning is that memory is not recovered when deleting a row in a MEMORY table, but will be reused when inserting a new row. To recover the unused memory do a</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> ENGINE<span style="color: #66cc66;">=</span>MEMORY</pre></div></div>

<p>to rebuild the table. The default index type is hash, but BTREE is also supported. Using hash is several times faster, but allows identical hashes on identical data that can lead to different kind of issues. These issues are nonexistent with BTREE.</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> MEMORY;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT<span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #66cc66;">&#40;</span> i <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> MEMORY;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t <span style="color: #66cc66;">&#40;</span> i INT<span style="color: #66cc66;">,</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #993333; font-weight: bold;">USING</span> BTREE <span style="color: #66cc66;">&#40;</span> i <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> MEMORY;</pre></div></div>

<p><a href="http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html">http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html</a></p>
<h3>Summary</h3>
<p>In conclusion, data in a MEMORY table isn&#8217;t persistent but works well for temporary data such as serialized objects or statistics counters and is great for storing calculated subsets of large tables and then do heavy subset calculations on that subset again.</p>
<p>If you don&#8217;t need transactions, MyISAM is good for any short table because of the high SELECT/INSERT speed and when the dataset is kept trim the table-lock won&#8217;t be a problem. Another case where MyISAM performs well is when a table is either very read- or write-oriented. Multiple SELECTS can query a table without blocking each other and the raw speed of MyISAMs writes results in good performace even with table-lock for each operation.</p>
<p>If you need transactions, InnoDB is one of the better alternatives for MySQL at the moment but is also a good choice for active tables where table-locks will kill your performance.</p>
<p>We&#8217;ve had problems with large statistical data in our product NettTV, in combination with MyISAM. Large selects would block the inserts for minutes, at worst. During this period, the tables were effectively constipated, and one would be led to believe that no statistics were gathered during this interval. Later on, the statistics would flod in. MyISAM, referring to the earlier discussion on it, just does not fit with this use-pattern. InnoDB with its row-locks and MVCC probably wouldn&#8217;t give the same performance on the statistics part and would still use a lot of CPU but wouldn&#8217;t constipate the table the same way.</p>
<p>The newer MySQL engines are exciting but most are at the moment too unstable and with too variable performance for anything but testing, but in a five year perspective a few of the mentioned names probably will see its usage.</p>
<p><strong>Some links on MySQL performance, ordered by importance</strong></p>
<p><a href="http://video.google.com/videoplay?docid=2524524540025172110&amp;q=google+engedu">http://video.google.com/videoplay?docid=2524524540025172110&amp;q=google+engedu</a><br />
<a href="http://www.amazon.com/gp/product/0596101716">http://www.amazon.com/gp/product/0596101716</a><br />
<a href="http://www.mysqlperformanceblog.com/2007/05/02/uc2007-presentation-and-notes/">http://www.mysqlperformanceblog.com/2007/05/02/uc2007-presentation-and-notes/</a><br />
<a href="http://jayant7k.blogspot.com/2007/07/mysql-query-cache.html">http://jayant7k.blogspot.com/2007/07/mysql-query-cache.html</a><br />
<a href="http://www.petefreitag.com/item/430.cfm">http://www.petefreitag.com/item/430.cfm</a><br />
<a href="http://forge.mysql.com/wiki/MySQL_Tutorials">http://forge.mysql.com/wiki/MySQL_Tutorials</a></p>
<p><strong>Some links on the new MySQL-engines</strong></p>
<p><a href="http://www.mysqlperformanceblog.com/2007/08/01/landscape-of-transactional-storage-engines-for-mysql/">http://www.mysqlperformanceblog.com/2007/08/01/landscape-of-transactional-storage-engines-for-mysql/</a><br />
<a href="http://forge.mysql.com/wiki/Maria_RoadMap_Design">http://forge.mysql.com/wiki/Maria_RoadMap_Design</a><br />
<a href="http://dev.mysql.com/tech-resources/articles/falcon-transactional-engine-part1.html">http://dev.mysql.com/tech-resources/articles/falcon-transactional-engine-part1.html</a></p>
<h3>To be continued</h3>
<p>More information related to the MySQL engines can be read in later posts. We will attempt to release one each week to come. Stay tuned <a href="http://feeds.feedburner.com/aptoma/selectstar">using rss</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/06/13/mysql-what-engine-to-use-aka-in-retrospect-my-first-mistake/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A-hoy! We&#8217;re a-bloggin&#8217;</title>
		<link>http://aptoma.com/select.star/2008/06/05/a-hoy-were-a-bloggin/</link>
		<comments>http://aptoma.com/select.star/2008/06/05/a-hoy-were-a-bloggin/#comments</comments>
		<pubDate>Thu, 05 Jun 2008 11:11:17 +0000</pubDate>
		<dc:creator>Geir Berset</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://aptoma.no/select.star/?p=3</guid>
		<description><![CDATA[We&#8217;re hereby entering the blog-sphere. We&#8217;re late, but now we&#8217;re here.
What you&#8217;re probably thinking right now, is &#8220;who in the world is Aptoma?&#8221; I&#8217;ll provide a short answer to that reasonably fair question; Aptoma is a small web-development company situated in Oslo, Norway. We&#8217;re something like the Arctic Programmers (which just might explain the A [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re hereby entering the blog-sphere. We&#8217;re late, but now we&#8217;re here.</p>
<p><strong>What you&#8217;re probably thinking right now</strong>, is &#8220;<em><a href="http://www.aptoma.no/social-aptoma/">who in the world</a> <a href="http://aptoma.no/vi">is Aptoma</a>?</em>&#8221; I&#8217;ll provide a short answer to that reasonably fair question; Aptoma is a small web-development company situated in Oslo, Norway. We&#8217;re something like the Arctic Programmers (which just might explain the A and P in APtoma). We&#8217;re passionate about web-development, and we&#8217;re passionate about solving problems for the new media. We&#8217;ve been so for some three years.</p>
<p><strong>What will we be be blogging about?</strong> We&#8217;re sure to be blogging about web-development, and everything related to it. Other than that, it&#8217;s pretty much an open issue as to what we&#8217;ll be focusing on. Still, I&#8217;m pretty sure we&#8217;ll say hallo to mister MVC, including JavaScript, MySQL, PHP, HTML, performance, frameworks, processes and the likes. And, of course, we&#8217;ll follow new media topics and trends. I&#8217;m quite sure we&#8217;ll be blogging about the specifics on how we put these technologies to good use in creating what we believe to be state of the art web-applications. The only real answer to that question, however, is that we&#8217;ll only have to see what comes out of it. We&#8217;re quite excited to see for ourselves where this leads us.</p>
<p><strong>Anyways. We&#8217;re here, you&#8217;re here, we&#8217;re a-bloggin&#8217;</strong> and we&#8217;ll not be going on a brag-blogging-frenzy, <a href="http://aptoma.no/blog-wp/2008/05/26/getting-real.html">like some companies do</a>. We&#8217;re on an exploration with this blog and we&#8217;re embracing the sharing-for-mutual-benefits model. Our motivation is that our efforts will make us better at what we do. Our main goals for the company, and for ourselves, is to get better at what we do, all the time, and by any means necessary. We figure that the more we put into our blogging-, and thus our sharing-, efforts, the more it&#8217;ll be for us to get out of it. Derived from that, you can rest assured that you&#8217;ll be presented with the best tips we can offer. We&#8217;re just hoping that it&#8217;ll be worthwhile your time to follow us, and that you&#8217;ll benefit from it and hopefully you&#8217;ll let us in on your insights from time to time.</p>
<p><strong><a href="http://feeds.feedburner.com/aptoma/selectstar">Stay tuned with rss</a></strong>, and we&#8217;ll try to treat your limited time respectfully, and we&#8217;ll hopefully be leveraging the <a href="http://en.wikipedia.org/wiki/Signal-to-noise_ratio">signal-to-noise-ratio</a> of the web.</p>
<p><strong>We&#8217;re starting today with zero subscribers</strong>, no readers whatsoever and a lot of catching up to do. Let&#8217;s skip the rest of the formalities and get started. Hope to see you around checking up on what we&#8217;re doing, how we&#8217;re doing it, what we&#8217;re fucusing on.</p>
]]></content:encoded>
			<wfw:commentRss>http://aptoma.com/select.star/2008/06/05/a-hoy-were-a-bloggin/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
