Enterprise JavaScript Coding

February 1, 2010 by Geir Berset in coding, Monday School, software 2 Comments » ()

Not many years ago, people would giggle and think of enterprise and JavaScript as an oxymoron. Not so much anymore. If you want to be a serious actor even in (or maybe especially in) the enterprise software market, you have to take JavaScript seriously. Large parts of your product’s business logic might find its way into your JavaScript scope, not to mention your GUI elements, and you must be prepared when it does.

We have decided to address JavaScript with more care. We’d like to give it the time and focus it deserves. Not to be mistaken, we are probably in the top 5% of businesses going deep into JavaScript on our enterprise software, but we still feel we can go further — especially in the fields of quality assurance and scaling.

Lo and behold, we give you;

Monday School Notes from JavaScript session #1

Table of Content

We have decided to divide the subject of JavaScript into the following sections:

We’ll be covering these topics in the weeks to come, and we well be returning with links on the TOC. If you will join us in this, please follow this blog on RSS or follow the company or the author on twitter. If you digg it, then digg it.


Notes on Continuous Deployment

January 21, 2010 by Geir Berset in process, teamwork 2 Comments » ()

Scenario: A customer has a problem with your software. His questions makes you think and you get an idea for a feature improvement. A good one! Act on it.

  • Plan it
  • Code it
  • Test it
  • Commit it
  • Deploy it

Nothing out of the ordinary, it seems. The seemingly new thing about continuous deployment is that we remove all the waiting in between each of these steps. That is, we don’t batch changes before to testing them. You test immediately what you just coded. You commit it. Not to a branch due for merging in a week. No, directly to trunk. And you even deploy it at once to production. Done. Clear your mind of it, enjoy, relax, spend some time doing something proactive.

Develop, test, commit, deploy. One feature, bug-fix or improvement at a time. That is the mantra for continuous deployment.

Some developers, and perhaps particularly managers, will find the idea crazy at first. Release immediately? Omigod. But frequently this sensation of safety in postponing something, is not rooted in any real benefits, and there seems to be no rational answer to the question of: why wait? Did your code ever get better during the night? Some will even argue that big feature releases is a marketing invention, invented to squeeze more money out of customers frustrated with the bugs in the current version. This practice of batching changes in releases in turn got embraced by the industry as the professional way of conducting business. I could easily add this to my list of major problems with the software industry.

Wouldn’t it be nice to deliver a bug-fix the same day it was reported? The same hour, even? It might just be possible if you are continuously deploying product improvements like this, kaizen-style.

Where does this idea come from?

The idea of continuous deployment comes from the lean school of thought. In the mindset of lean kanban pull systems — where you are constantly looking to remove waste and minimize your batch-sizes — grouping a lot of big software changes together in big batches and big releases makes no sense.

Even Flickr is doing continuous deployment these days.

The whole idea is to respond to customer “pull”. The customer needs something now. The clock starts ticking. Is the feature more useful if you wait a bit before you start coding it? No. Then start now. Is the feature better if you wait with testing after coding it? No. Then test now. Wait until tomorrow to integrate into version system? Is that better? No. Then integrate. Does thing get better if we wait to deploy that new feature? No. Then deploy it. Stop your clock. The shorter the time, the better for the customer. And better for quality, it seems. And the better for you, the coder. Unreleased code is stressful. Released, working code, is stress free. The code works. It’s proven.

After a few iterations, our fear level was actually lower than how we used to feel after a staged release. Because we were committing less code per release, we could correlate issues to a release with certainty. (From Case Study: Continuous deployment makes releases non-events)

Bugs that are not discoverable in testing are easier to find and fix if you released only one feature, than if you batched a months worth of changes together in one big bang release. Or so the lean school of thought states. I agree. You would have written the code hours ago, and probably say ‘a-ha!’ the second you hear about it. 20 minutes later, a bug-fix is released. Why wait?

A few resources on lean from Amazon:

What does it take?

Not surprisingly, continuous deployment requires a high level of team discipline. No goofing off and no more fixing everything mañana in testing. Practices such as 5 Why root cause analysis, extensive testing and automation will be required. But it will all lead you to better quality and higher iteration speed. Speed and quality, it is all related.

Eric Ries writes on O’Reilly Radar about “Continuous Deployment in 5 easy steps“. Eric Ries has gotten a lot of traction with his Lean Startup movement lately. And you can follow his thoughts and reports on personal experience on continuous deployment on his blog http://www.startuplessonslearned.com/

Look ahead

Will we pursue this issue of continuous deployment any further? In the name of delivering a high quality experience to our customers and users now, when they still need it: yes we will.

If you will join us in this, please follow this blog on RSS or follow the company or the author on twitter. If you digg it, then digg it.


Great Software : A Definition

January 16, 2010 by Geir Berset in process, software No Comments » ()

Defining what great software is, is not a complex endeavor. I prefer to boil it down into two distinct characteristics.

a) Ease of Use

The software solution walks you gently through the process of solving your problems as intended. No distractions, no unnecessary decisions to make, no confusions, always heading towards the goal. In short : A great user experience.

b) Quality of Craft

Quality of Craft means that the software solution is easy to maintain, it’s easy to modify so that it’s always accurate in solving the ever changing problems it’s going after (and by that I don’t just mean in configuration, but easy to change the code of), it’s easy to move about (i.e. from server to server), it performs great, it responds immediately when interacted with and it scales, or can be modified to scale, in order to handle the traffic it will attract.

Photo by toffiloff

Isn’t that an intuitive definition?
The definition is common sense in software product development.
Perhaps a “Software Product 101” should start out with such a definition.

Thus, we should assume that everyone get’s this nailed down from the start, or through a little experience, no?

I’m sorry to report that surprisingly few players in the industry is able to see this.
And even fewer is able to execute it.

It’s always either or, never both. The tendency is quite prominent.

It seems as start-ups create great solution accuracy and a good user experience using customer oriented development methods, and in general a lean start-up methodology. Growth, however, turns them in the direction of stability and the need for predictability. Sadly this usually leads to a loss of the qualities in a) in favor of a few of the more stabilizing qualities in b).

As much as I agree that as a start-up turns into an enterprise, the need for operations stability and predictable performance increase. I do not, however, believe that this is achieved best by sacrificing the ability to adapt and to keep the software usable and accurate.

I believe that while sacrificing qualities in a) might seem to be the easiest path, it is not the only path, not to mention that it in no way is the right path.

I am part of a small, growing, start-up company, and we’re focusing our resources at keeping on the right path, maintaining the agility of a start-up, and the stability and predictability of an enterprise. It is possible — this path exists.

It’s hard to find. But we’ll find it, or die trying.

Related posts


Quality and speed. A primer in team design.

January 8, 2010 by Geir Berset in teamwork 1 Comment » ()

How you design your team has a great deal to say for the speed and quality of the resulting work the team will do.

Speed

The ultimate ideal for speed is a one-man show.

There’s this one guy doing everything in the project. He is competent in engineering practices such as software design, scaling and testing, and he excels in design, user experience and what not.

When he has an idea, he considers it. He weighs it back and forth. Then a decision is made, right there on the spot. No waiting, no straying, no nothing — just an idea, wham bam, a decision. All while taking a shower or having a cup of coffee. Soon he’s back at the keyboard implementing it, testing it, committing it, deploying it.

The next thing you know users are using it, providing their feedback and insight.

Did you notice the lack of all-day design committees (probably reaching no consensus, scheduling follow-up meetings the next week). Notice the lack of a board of directors that takes weeks to assemble into one room. Notice the lack of calling, facilitating and planning endless project-meetings. No extensive change processes. No writing frequent status reports. No large specifications documents needed to keep in sync for review and approval, there is no handing over documentation between the design phase and the implementation phase.

With the one-man show, it’s just wham bam thank you ma’m. It’s like a ninja — the ultimate warrior mastering all needed disciplines needed to win the war, and he’s calling the shots faster than you can say cheeseburger.

– Cheese (I chopped your head of) burger.


Quality

But say the ultimate goal is quality, not speed.

The ultimate ideal for quality is hordes of different specialists working the project. They work in teams, or they might even separated into entire departments or companies dedicated to one profession each. One of these teams consists of the best designers the world has to offer. Another team has the best programmers and there’s a team of the best usability experts in the field.

At the start of the project all sit together until everyone agrees what problem we are solving, and how we want to solve it. If new challenges arise underway, they will gather to discuss and to agree how to adapt.

We start the project, and we are passing the project from team to team, giving each of them the time they want, and do not interrupt them until they’re done. They will ask for your input when they need it.

When they are finished, you should have the best quality money can buy, shouldn’t you? After all they had all the time in the world, and they were not interrupted.

Speed or Quality?

Now, all you have to do is choose, will you need speed or will you need quality in your particular project? Choose your strategy from the two outlined above thereafter. Unfortunately, it’s not that easy. Or, rather: luckily there’s more to it.

The problem with speed

With the one man show, you are very exposed to obtaining one or more crappy qualities on your software product. Your guy will have his strenghts, and there is bound to be weaknesses. So you end up with the best user interface, but a crappy test-suite making the software unmaintainable. Or perhaps you have a delightful backend, but a user interface that sucks. Or maybe your ninja is so anal about every quality that he is failing to build up speed in the first place.

It just does not scale, and you are very exposed to this individual’s priorities.

Ultimately the lack of tests, the poor scalability or the increasing complexity will stop the project dead in its tracks. So much for the ultimate speed ideal. It was fun while it lasted, but it did not last. Project stops. Try again.

The problem with quality

OK, so you chose the smart path and aimed for quality. You landed the big budget and told everyone to do their very best, and gave them all the time in the world.

I’m sorry to report that problems seems to pile up for these kind of teams as well.

So your user experience department spends a week, a month, two months, or whatever time it takes to craft the best user experience possible for the problem they are solving. Then your design department spends a week, a month, two months, or whatever time it takes to create the best design of their life. They hand it over to engineering, and, as they go off to win a design award, the engineers have a go at fulfilling all the promising features that the UX-team wire-framed now fitted with the award winning design. They spend a week, a month, two months or whatever it takes to create the very best implementation. Their algorithms are featured in next years text-books for the university students.

Only now the solution solves yesterdays, yestermonths or even yesteryears problems.

The core problem with this quality approach: While each of your teams were busy creating the best user experience, the best design, the best algorithms, someone else were busy making the most accurate solution for the problem.

And as if that wasn’t enough, they were adapting to the problem as it changed in character. The problem always does that. Change, that is. At least in the field of software product development.

No problem, you’re thinking! Let’s just pass the ball back to UX, on to design and then back to engineering. That will show them!, in just 1 month or 2months or n months time, that is.

Oops, and there’s that change of problem to solve again. Darn it. Your project is behind, and your award winning design and algorithm are not solving the right problem. Quality wasted.

Turns out that time available and uninterrupted focus in each discipline is far from the only factors which affect the total quality of the project.

Solution accuracy is also a quality. Too often forgotten, but probably among the most important ones.

The solution lies in team design

In order to realize the solution, you will first have to buy into the assertion that:

You need quality to achieve speed, and you need speed to achieve quality.

This means that you should not choose either speed or quality. You should aim for both, as they tend reinforce one another. It might feel counter intuitive or even paradoxical, but it holds out.

The team design solution you are looking for draws inspiration from both worlds: the overly optimistic one man show (the ninja) and the bureaucratic silo process with a lot of specialists in their own departments passing the project from silo to silo in total isolation from one another.

You need speed and you need quality.

It’s really quite simple to explain (and very hard to execute)

You need one guy calling the shots, and you need specialists backing up and implementing in quality what he dictates. They all need to interact in a cross functional environment and they also need time to dedicate themselves to their specialties, creating price-worthy work within their profession.

You need to strike the fine balance in your team design to get the speed of cross functional teams, combined with the qualities of the isolated teams. Your continued efforts to ensure both will ensure that these two disciplines reinforce one another to achieve the best quality software you can produce.

“Accept and embrace the risks, even plan how to mitigate them, but don’t avoid them.”

There are, of course, loads of challenges.

  1. Why should the teams of specialists trust the chief engineer’s decisions, and thus back his ideas with a 100% commitment and enthusiasm?
    • The centralized decision-maker will have to be a respected individual, an individual in which all team members place their trust. (Challenge: These individuals are hard to come by, and they have to invest a lot of energy into understanding everything about all parts of the project)
  2. How will the chief engineer be able to know all about everything in the project?
    • Do not underestimate the capacity of man, given the opportunity to focus. The chief engineer will focus on two things : knowing everything about the project, and developing a deep relationship and understanding of the problem to solve (i.e. getting intimate with the customer group). He makes it happen through this focus.
    • Also, he must be wise and humble. He must be so humble as to pull information from specialists whenever it is needed to reach the best possible decision for the project. Just as much as he knows where to get all the information he needs — he does not himself possess it. He is trusted to do this, and his trust among his peers gets strengthened all the more when actually doing it.
  3. What if the chief engineer is hit by a bus?
    • A very common question from concerned prospects. The truth is that unless the chief engineer has had a brilliant protege at his side all this time that can step in immediately and take over, your development process will grind to a halt. At least for some time. Consider the alternative for a moment, before gasping in repulsion. Your alternative is a process similar to design by committee (a lot of people involved in every project decision), in which case your development process will grind to a halt from day one. Surely and you will be safeguarded from loosing momentum, but it’s not worth is when the solution is to never really building any in the first place. And that’s just not a smart strategy. Accept and embrace the risks, even plan how to mitigate them, but don’t avoid them.

There are surely more challenges with this approach than my shortlist above, and you’ll have to develop the capacity to deal with them. Nobody said this would be easy. I just asserted that it will be the right thing to do. And it will. Master this, and it will make you strong, fast and of high quality.

Summary: The solution in a simple list

You know you love lists. I know you love lists. That’s why I’m willing to oversimplify the answer to this complex problem, and state it in a few simple bullet points below. Be sure, however, that you set out to absorb these ideas beyond this list, and follow up on the recommended reading, in order to make your team design better and better with time. Kaizen.

  1. Keep decision making centralized at a wise, trusted, experienced and respected source.
  2. Ensure a steady flow of cross-functional communication between specialist teams to keep everyone aligned towards the ultimate goal of creating the best overall solution for the problem — not just the best design or algorithm in isolation.
  3. Ensure that each professional discipline has enough isolated time so they are able to produce work they are proud of.

Am I making this up? (AKA: References)

The concept has been tried and tested not only by us, but in larger scale by Lean inventor Toyota (their chief engineer role), by Gore (the Gore-tex manufacturer) who refuse to scale beyond 150 people per factory in order to keep it small enough for one guy to pull all the threads and have direct contact with everyone involved.

Toyota relies on the chief engineer role in every new product development process. Their future existence relies (more and more) on their ability to come up with new designs that accurately address the needs of the marketplace, and just for this reason they have the chief engineering role, which also makes them one of the best in this field. The others are following in their tracks.

Both of these companies has enjoyed great success as a result of their strategies.

There is no silver bullet. You’ll need to experiment with team design to master it. My best advice would be, that you give it a shot in the small, and in the process you should read the references provided. See how it goes, revise, celebrate your ability to spot your failures, acknowledge your victories and try again.

Repeat indefinitely.

If you like this post and want more, enter the fourth dimension. You can subscribe by email to the right.

Disclaimer

Never mistake team design to be the silver bullet. There is no silver bullet. Good team design will not help if you have team members that lack the skills or the motivation to excel within their respective fields. You need excellent individual skills in each team member to pull this one of, and you need to put full trust in them.

It’s not a development strategy for the faint of heart, or for the micro managing control freak.


3 Major Problems With the Software Industry

January 3, 2010 by Geir Berset in software 6 Comments » ()

There are three prominent problems in the software industry that bothers me in particular at the moment. Being a part of that industry, I feel somewhat responsible to help shed some light on these problems. I list each problem below, with a proposed solution outlined.

broken

Problem 1. Foot-in-the-door Software

The recipe for creating foot-in-the-door software is really quite simple:

  1. Design a software that can do anything with “a little customization”.
  2. Make it hard to customize. Make every protocol and specification proprietary and hard to understand.
  3. Don’t go anywhere near any standards.
  4. Provide a horde of overpriced consultants to fix all of the above problems, and have them apply the “Ninja Technique (Problem 3)” so that they can stay on-site indefinitely.

Voilá! Now just wait for money to pour in from miserable customers.

Solution: Empower your customers through creating standards compliant API’s and plugin environments with open and commonly used technologies for which problem solvers can be found everywhere and anywhere. Even better, stop creating problems for your customers in the first place. Stop sticking your foot in the door, and focus on creating something that makes your customers the stronger one. They will rely on you all the more for it — and it will be a relationship based on trust, not desperation or despair.

Problem 2. The Green Lights Problem (Unchangeable Bloatware)

Players in the industry still build software that is close to impossible to change or make additions to. It can take months or years between releases, and they buffer up so much change that no-one ever dare (or can afford) to upgrade. This frightens the purchasing department. Big time. As a reflex to this problem, they have (understandably) rehearsed their spec writing to include everything but the kitchen sink — effectively forcing anyone that want to be a player in the industry to create bloatware from day one rendering the software hard or close to impossible to use, as well. A 100 page requirement is not that uncommon(!) 20% of the required features typically end up being used (*). The longer the spec the better, it seems. It means less risk of having to ask for a software change from a broken industry — let alone taking the risk of going through actually receiving it! Purchasing is happy though, they did their best to fight change by getting their little green lights on all checklist points in their gargantuan spec, and then they moved on never seeing the havoc in the wake.

This quote from It’s Learning shows just how bad it can get “We need not concern ourselves with the users, as long as we make money” Meaning that they’ll throw anything into their software to please purchasing, even though it makes the users miserable. (via Ida Aalen and google translate, original post in norwegian here)

Solution: Dear Industry, let’s at least start by creating usable (as in usability), modifiable and maintainable software, effectively showing the customers that we can please both the user and the purchasing department all at once: changes and additions to the spec is possible without a headache, which allow us to keep software functionality concise and to the point. Happy users and happy purhcasing = happy everyone. Even you. Look to lean and agile for your solutions embracing, not fighting change. It’s possible, you know. It just might lower costs and increase your overall software quality, as well.

Problem 3. The Ninja Distraction Technique  (using Tech Jargon)

Traffic ConesThe software industry has spent years (or maybe decades) educating their customers in tech jargon. It’s all a part of the ninja technique of distraction. It is. Really. The theory goes: Keep throwing words such as “Java, JBoss, Caching layers, Multi Tier Software Development Housing Fascilities Campus” at the customers, and you will not only sound very professional, but what’s even better, the customers will soon forget what they really were asking you for, so there’s less of a chance chance you have to deliver.

Imagine being the customer in this scenario: Here you were looking for a a) safe car with b) comfy seats, c) low fuel consumption, d) good stereo sound and a e) large trunk for all your groceries, and suddenly you had a car salesman giving you a primer in everything ranging from the new four layer varnish coating technology to the latest in air-pressured suspension theory and revolutions within the field of fuel injection and what not. You don’t want to hear about that, you want to know if it will hold your coffee cup steady while playing your Mozart in a perfect pitch.

Well, the industry seems to have distracted you from all that.

Solution: It’s about time the industry starts talking about the metrics that the customers can relate to and understand. And more notably, the ones that they need. Let’s talk about ease of use. Let’s talk about performance (can you handle 100 users registrations a second?). Let’s talk about modifiability (can you deliver a medium sized product feature change in 1 week, or less?). Let’s talk about reliability. Is your up-time average more than 99,97%? Will your software automatically restore upon any hardware problems? Can you upgrade our software frequently without involving our tech-department? What is your fix time for bugs?

There is no need to talk about technologies if you can reassure the customer on the real metrics. If you can deliver on these metrics (and deliver you can if you just stop with all the distractions), you would not need to apply your ninja techniques, and our software could be made by an aging brontosaurus for all they would care.

Let’s shift focus onto providing real software solving real problems for real users — injecting relief instead of frustration and hopelessness into this world. (**)

Please contribute with any experiences you have with the industry in relation to any of these problems. Or maybe you have a beef of your own with the industry. Bring it on! We (the players in the industry) desperately need to hear it in order to improve.

Footnotes

(*) This is my personal estimate, based on the pareto principle and experience. However, probably not an unfair estimate given the magnitude of this problem.

(**) Am I trying to convey that we’re perfect in this respect? No. We’re improving all the time.

Tune in to the fourth dimension using RSS or follow me on twitter.com/geirber


Caching Concluded (for now)

November 16, 2009 by Geir Berset in Monday School No Comments » ()

hammertimeIt’s been very interesting looking at different technologies and strategies, and we are now concluded and will be leaving the topic of caching for now.

Caching Todo-items

Reverse Proxy Caching and ESI
We will be creating an internal guideline-document for using Varnish, which we’ll also publish here on the blog. Varnish is a simple, fast and stable technology excellently suited for the job (Reverse Proxy Caching and Edge Side Includes).

Data Caching
We will be re-implementing our data-cache layer in our framework, namely AFWCache. The goal is keep options open for future technologies, while solving today’s needs elegantly (as in: sustainable software development). It will become an abstract class which through use of the adapter pattern will support our two technologies of choice, namely Memcache and APC, which serves as both opcode cache and data cache layer.

Future topics for Aptoma Monday School

We’ll be spending this week considering a new set of challenges to thrive on. We’ll conclude this Friday in which direction we’ll move.

  • JavaScript Guidelines
  • Upgraded database adapter class for our framework

Suggestions? Hang in there (RSS)


Caching Improvements

November 11, 2009 by Geir Berset in Uncategorized No Comments » ()

We are continuing our caching discussions from the past few weeks :

  1. 7 Approved Caching Technologies
  2. Caching Strategies

Our goal in the previous few weeks has been to identify caching strategies and technologies. We have now used this knowledge to identify where to invest our focus on improvements. Our conclusion is to go fiercely ahead investigating the following couple of topics further.

1. Reverse proxy caching (Varnish and ESI)

Håkon and Michael will be putting in the effort to research what is possible to do with ESI and VCL.  By getting an overview on how we best can utilize Varnish in our applications we can either learn how to configure Varnish ourselves, or inform hosting providers of concrete scenarios which then they can configure for us. It’s worth noting that Varnish has been performing flawlessly on almost every installation we have had to date, so this is an important tool. Our hope is to develop some sensible and helpful best practices for using Varnish with and without ESI.

2. Data caching

Lars and Stefan will be looking into how we can improve our very own AFWCache-mechanisms (AFW is our inhouse, still unreleased framework). We might include MySQL / SQLite data-caching options, as well. Another core topic for them to explore further is whether we’ll be extending the use of APC to also include data caching (as opposed to only opcode caching). In order to decide to use APC for data caching on high performance production environments, we have to learn more about it’s behavior under duress, and we need to know if we can make it degrade gracefully (i.e what happens if APC runs out of memory?)

We will currently not invest more energy in

View and subview caching (application caching) which basically is the slower sibling of Varnish with and without ESI. Although more flexible (you can process cookies), we choose to discard any effort in improving in this area to keep our focus where it need be for the moment. Also our current support for view and subview caching through data caching is performing just fine.

Query Cache – We are requiring query cache to be set to “ON” on every installation. Our conclusion is that this is more than enough for our current needs. We’ll be revisiting the topic of setting query cache to DEMAND and using the /*SQL_CACHE*/ trigger when we are making the new Twitter, or something with a similar requirement for database scaling.

Client caching – We will be refining our guidelines and best practices for setting the correct headers, but we will not invest in the topic of Local Storage (“the new cookie”) as of yet. We’ll wait until the browser support is broader. Local storage is expected to save us a lot of AJAX-callbacks in the future. More on this sometime in the future.

Opcode cache – Having discarded a couple of other candidates, we require APC for PHP on all our installations. Once up and running, it just works without any intervention, so we’ll not be investing any more energy into the topic of opcode caching. We have strengthened our systems setup testing for APC, and we’ll leave it at that.


7 Approved Caching Technologies

November 5, 2009 by Geir Berset in Monday School 2 Comments » ()

(These are the notes from Aptoma Monday School for week #44 and #45)

We have recently been blogging our notes on different modi operandi of caching. To sum it up, these were: reverse proxy caching, application caching (view caching, subview caching and object/data-caching), opcode caching, client caching and query caching.

what-are-you-looking-at

We have spent the last two weeks discussing which technologies fits our needs best. Our product installations handle high traffic (i.e. millions of views a day) and we also have products that produce a lot of data that is heavy to compute (i.e. large data sets computed from complex database queries). Thus we have to scale both vertically and horizontally, so to speak. Our needs of caching is thus quite broad, and we have to shop for a lot of technologies in order to fulfill all our needs. This brings us to our …

List of  7 Approved Caching Technologies

In this post we will provide a few notes on each technology.

1. Varnish.
Used for reverse proxy caching. We are not happy with hosting providers ability to configure and fine tune Varnish, so rehearsing your skills in this is certainly not a waste of time.

2. APC Opcode Cache.
Use it. Always. We cannot find examples for when not to. It will speed up execution with no known disadvantages. APC seems to perform better than both competing alternatives (xcache and zend accellerator) under all circumstances.

3. APC for data caching.
You can use APC for data caching, as well. It will outperform memcache by a factor of 10 to 50(!). As opposed to memcache, it does not have distributed access. We have no idea what happens when you exhaust the assigned RAM for data caching. We will have to find out, won’t we?  (Yes we will)

4. Memcache for data caching.
Memcache is still one of the most stable and high performing technologies for its use. It is nevertheless annoying that its performance is significantly slowed down by the fact that it runs on top of TCP/IP (which is also an advantage when it comes to flexibility), even in those cases where only local access is necessary. It seems to us that memcache is run on the same server as the application in more than 90% of all cases.

5. Varnish ESI.
As the name implies, this is a feature of Varnish. ESI is very interesting as it implements subview caching without having to implement it at application level (slower). Implementing subview caching on reverse proxy level can speed things up significantly given the right circumstances. A disadvantage of Varnish ESI is that it introduces more complexity to your source code, as you’ll have to write “Varnish markup” in order to have Varnish do edge side includes for you (<esi:include>). Not a big deal, but definitively a declared enemy of the simplicity ninja. The benefits might make it worthwhile, nevertheless.

6. MySQL for data caching.
This is basically “use MySQL the way you would do memcache”. MySQL for data caching performs quite well, but not really astonishing (memcache is about twice as fast in our tests). MySQL is a convenient technology, as we already require it on all our products, it is also one of the more available technologies regardless of hosting partner (and we have quite a few of them). Another advantage of using database technologies for this purpose is that more sophisticated queries can be applied for purging and invalidating data, than can you in the simpler (but faster) key value databases (memcache et.al).

7. SQLite for data caching.
Same use case as with MySQL above, and with the same pros and cons. It does perform a little better than MySQL. We have decided to look more into this one. Another advantage over memcache is that SQLite and MySQL become persistent caches, (whereas memcache is volatile, i.e. will be blank after a restart).

List of Discarded Technologies

APC as data cache and Varnish ESI are the only technologies which we have not exhaustively used in production for years. Nevertheless we will seek to improve our way of using all these technologies in the time to come, and we will be looking for how to implement support for these in our framework (Aptoma Framework, AFW). To show that we did not just stick to our guns on this one, we present this short list of alternative technologies which we also explored during our tests.

  • Tokyo Tyrant (reason to discard : somewhere in between MySQL and Memcache in performance, and brings no new advantages to the blend)
  • Nginx (not really a caching technology, merely a faster web-server in special circumstances)
  • Lighttpd (Not a caching technology, but can provide faster delivery of static content than Apache can deliver. On dynamic content it does not perform better than a properly stripped and fine tuned Apache.)
  • DBA (Does not perform as well as APC in our tests).

We have more discussions to come!

On our wish-list (todo-list) for caching and performance discussions is as follows.

Better benchmarking

  • Siege
  • Jmeter
  • Apache Benchmark (ab)
  • Httperf

Caching techniques

  1. Pre-loading cache (warming)
  2. Event triggered cache invalidation (cache on update)
  3. Stale cache (set flag but don’t purge, combined with a grace time in reverse proxy)
  4. Better caching on logged in users (Varnish ESI use case)
  5. Setting proper headers (for improved client caching and more)

Notes From some of our Benchmark Tests

Tokyo Tyrant

Tokyo Tyrant is a memcache-like layer on top of Toky0 Cabinet, which is a fast key-value database. (see : http://sameerparwani.com/posts/tokyo-cabinet-and-tokyo-tyrant) Installation was easy, everything required was available at http://1978th.net/. As a  PHP-wrapper we used http://mamasam.indefero.net/p/tyrant/downloads/2/

We tested Tokyo Tyrant with default settings (defaults matter). The same goes for the comparisons, Memcache and MySQL (query cache is off). The time given bellow is the time it took to set and get 1000 variables off 1 kB.

TokyoTyrant

  1. put: 0.102526473999
  2. get: 0.121464586258

TokyoTyrant disk hash

  1. put: 0.108086037636
  2. get: 0.123809480667

TokyoTyrant disk B-Tree

  1. put: 0.111338186264
  2. get: 0.129682970047

Memcache

  1. put: 0.0864425897598
  2. get: 0.0702331066132

MySQL without Query Cache

  1. put: 0.112287640572
  2. get: 0.164221072197

Tokyo Tyrant is slower than Memcache. Tokyo Cabinet can probably outperform memcached if accessed directly, but no PHP-bindings for this purpose were available for our tests.

What is exciting is that Tokyo Tyrant is faster than MySQL for persistent data caching. Tokyo Tyrant is also supposed to have some other features which we have currently did not have time to test. Please share any experiences you might have with Tokyo Tyrant.

MySql, SQLite, DBA, Memcached and APC tests

The test : Write, then read and add to an array 100 000 MD5-hashes. For the relational databases, the insert is done with a multi-row-insert or a  transaction.

MySQL

  • Create : 2038.615942 ms
  • Read : 13782.1378708 ms

SQLite

  • Create : 2084.01703835 ms
  • Read : 4064.75901604 ms

DBA (NOTE! Only with 10 000 elements this time)

  • Create : 10192.3089027
  • Read : 10065.6449795

Memcached

  • Create : 3493.89410019
  • Read : 3219.08593178

APC

  • Create : 25127.6450157 ms
  • Read : 172.363996506 ms

MySQL will insert a lot of rows at the same speed as SQLite. W/O query cache, it will be outperformed by a factor of three by SQLite.

SQLite supports :memory: instead of files, which can speed it up as a relational cache, but it is no longer persistent between restarts. MySQLs MEMORY-engine is more of the same.

DBA is a copy-on-write which means that every write will increase its file-size. This makes all operations slower the more writes you have to do. Performance loss was at times huge, but it can be fixed by issuing an optimize-command. DBA writes slowly, (50x slower than the relational databases, and much slower than the other hash-buckets). Reading from a clean file will make it perform somewhere in between the hash-buckets (memcache, APC) and the relational databases

Memcached is a little slower to write to than the relational databases, due to the databases doing all writes in a single command. Read is about the same as with SQLite and only twice as fast as MySql. (Bear in mind that this is a best case scenario for the relational databases)

APC is 5-10x slower than Memcached in writes, but 10-20x faster to read from in this test.

Test conclusions

DBA has few advantages over the relational databases. APC can replace Memcache in some of the areas in which we use memcache today (data/object-caching).



Caching Strategies (AMS week #43)

October 22, 2009 by Geir Berset in Monday School 1 Comment » ()

This is the second post of our lecture notes from Aptoma Monday School – series (AMS). As it is lecture notes, you should expect the texsts to be a bit rough around the edges. We’ve decided to discuss and revise our caching strategies. Our session this week was spent settling upon a set of cache layers.

disk-cache

To make things perfectly clear : a cache is a temporary storage area where frequently accessed data can be stored for rapid access.

1. Page caching Reverse proxy caching

Caching the entire page is typically done using Varnish (see the technology references at the end of this article). Our experience with Varnish is that it is stable, and very high performance. However, people who really know how to configure it properly does not come by the dozen. As much as we could hope that all our hosting providers see it as their job to get to know this kind of blazing technology intimately, we can not rely on it, and thus we will have to leverage our own competency further in this area. A weak point of page caching through Varnish, is that once you set a client cookie, Varnish is forced to send traffic straight through (i.e. no caching whatsoever) or dispose of the cookie all together (i.e. break application functionality). Local storage (the “new cookie”, as discussed last week) will not create any such problems for Varnish.

2. View / subview caching (Application Caching)

In MVC-terms, the view is the rendered HTML from the application. Caching the full view would be like the “page caching” from above, only in this case done within the application itself. This would overlap with Varnish strategy above. But even though this version would solve the cookie-problem depicted above, it is normally no longer used this way after Varnish came along.

Subview caching, however, can still be of good use. A subview is simply a part of a page. This is useful when you e.g. have Varnish set for 1 minute caching, but you know that parts of the page could easily be cached for an hour without becoming obsolete. Examples include a seldom changing navigational menu, a generated thumbnail image or page headers and footers. To do subview caching will involve putting your subview data into memcached, file cache or you can achieve something similar using Varnish Edge Side Includes (ESI)

3. Data caching (Application Caching)

Data caching is typically done in your “model-layer” (again with the MVC). A class-function in your business logic will check memcached if there are cached versions of the data about to be fetched from database before proceeding with the SQL-statement if no such data can be found in cache. Afterward it will update the cache with fresh data. This allows a very fine grained control on cache times on different data. Data which is heavy on the ol’ computer and can be cached for a very long time, is candidate even for being serialized and put into a database for easy recovery upon reboots etc. Involving a database for handling large data sets is also provides you with more sophisticated access, update or delete schemes beyond what the neanderthal key/value database in memcached can offer.

We rely heavily on this strategy in our products, and the strategy is typically a good one for creating robust API’s. Also, we use it in our in-house framework (Aptoma FrameWork, AFW) for the Autoloader (automatic discovery and loading of classes) where file-paths are stored in memcached for rapid access upon subsequent requests. Used in combination with APC, this provides us with a blazing fast framework technology.

Let’s take an example

An Event object that handles data in a roster scheduler is stored in the “events” database table. To load an event, we have to fetch a lot of extra information relating to it: Employee, department, workplace, conflicts with interfering events, and so on. All this will be very time consuming. Thus, we should store all data to cache for re-use!

public function loadEvent($id)
{
 // Try to fetch the event from Memcache data storage
 $event = Cache::getObject($id, 'Event');
 if (!empty($event))
 {
  return $event;
 }
 
 // If the Event does not exists in Memcached,
 // we do the conventional, time consuming processing here.
 
 // Finally, we do cache the Event for the posterity
 Cache::setObject($event, 'Event');
 return $event;
}

Usually we use cron-jobs to pre-load these data as well.

A main challenge for this kind of caching-scheme is to prevent data from becoming corrupt. If an interfering event changes, and we fail to validate the cache from the example above, we would be left with bad data in our cache. Keep this in mind when designing your data caching.

4. Local Client caching

Local Client caching is something as simple as the client itself keeping a version of some data, not asking for a new version from the server. This can be cookies, a CSS-file or any other form of static content (javascript or images).

There are two strategies for client caching :
a) force very, very long cache times on the client, and change file names whenever you need the contents of a CSS-file for instance or
b) keep cache times low to ensure that the clients asks for an updated version from time to time.

Strategy a) adds complexity to the application level, as you will have to handle file names changing, and changing file names might break external dependencies on the file. However it will ensure that changes to static content is displayed at the clients immediately. Yahoo recommend this strategy. Strategy b) will delay updates of static content until the cache at the client has expired, but will keep things simpler for the programmer and more stable for any external dependencies on the file names.

5. Query Cache

SQL-query results are cached in MySQL internally. Configuration at server level can be OFF / ON / DEMAND. An attempted cache lookup with no hits, gives a 20% loss of performance. DEMAND requires use of SELECT /*SQL CACHE*/, to force caching. Putting the keyword in comments, will ensure SQL-compatibility with other engines than MySQL.

We recommend using the ON or DEMAND setting on your installation. If you never touched this setting, it is probably set to ON.

If you need to squeeze out every bit of performance available from your application, you should switch to the DEMAND-setting, and review all your SQL-statements and judiciously use the SQL CACHE-trigger. If you have the time and skill to do this, you will benefit from it. You would want to use it on tables with a high percentage of read over write requests. Using SQL-CACHE on tables with a lot of writes will only hurt your overall performance. (Lars has more to say on the topic of Query Cache)

Referred technologies

  • Varnish – high-performance HTTP accelerator. (see also Varnish ESI).
  • Memcache – high-performance, distributed memory object caching system.
  • APC – caching and optimizing PHP intermediate code.
  • AFW – our in-house framework which is tightly coupled and integrated to all of the above technologies. (due for release New BSD License in 2010)

MySQL Query Cache

October 22, 2009 by Lars Hetland in coding, Monday School 1 Comment » ()

The following article is an in-depth look at Query Cache, mentioned briefly in our post about Caching Strategies.

With Query Cache, the result set from SELECT queries are cached with the query command itself as key. Meaning that if a SELECT query initially executes in ten seconds, consecutive times the _exact_ (including whitespace, case and more. The query must be an identical string.) same query is executed, the server gets an hit in Query Cache and returns the cached data in milliseconds, normally giving performance boosts between a few and many orders of magnitude. QC is a global server setting and has three modes; ON, OFF and DEMAND. Setting QC to OFF disables it, with it to ON makes the server look for a cached result for all SELECT queries while setting it to DEMAND will have the server only do a QC lookup if the query has SQL_CACHE after the SELECT command. It’s recommended to put this MySQL-only extension in a comment to keep with compatibility with SQL standards. (SELECT /*SQL_CACHE*/ * FROM brille) A QC miss will add up to about 20% overhead compared to the same query executed directly without QC lookup. QC is guaranteed to deliver fresh data as all QC on a table is invalidated when a INSERT, DELETE or UPDATE command is executed on it. This means that for a table with frequent writes, QC on reads will probably decrease performance. On tables with a high number of both reads and writes, setting QC to DEMAND and not use it on queries hitting that table might be a good idea. When actively using query cache on tables with dates it’s also important to write reusable and cacheable queries. Any use of NOW() or equivalent non-deterministic functions will render query cache unused even within the same second, so date and time should be calculated outside of MySQL. When doing so, first analyze the need for an exact query. If you can get away with < 1 minute accuracy, don’t add date( ‘Y-m-d H:i:s’ ) but date( ‘Y-m-d H:i:00′ ) so any identical queries within the same minute will get the cached result. Subselects will not use QC. On high performance applications, QC should be kept in mind when designing the database and queries. Below are two examples where performance characteristics could be radically different:

First table structure:

CREATE TABLE `users` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`username` VARCHAR(16) COLLATE utf8_danish_ci NOT NULL,
`password` text COLLATE utf8_danish_ci NOT NULL,
`salt` text COLLATE utf8_danish_ci NOT NULL,
`last_time_visited` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY  (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;

Second table structure:

CREATE TABLE `users` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`username` VARCHAR(16) COLLATE utf8_danish_ci NOT NULL,
`password` text COLLATE utf8_danish_ci NOT NULL,
`salt` text COLLATE utf8_danish_ci NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;
CREATE TABLE `user_visited` (
`id` INT(10) UNSIGNED NOT NULL,
`last_time_visited` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY  (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci;

If the first table was used on a high traffic forum, every time a user loads a page the QC would be invalidated as last_time_visited had to be updated. By denormalization and having two tables where one is often read and the other is often written to, QC on the first table would be kept until a user changed password/username or a new user is registered which is probably less frequent than pageloads.

But the way the SELECT query is created is also important. Lets say you normaly just need the basic information about a user, but every time you visit the users profile, last_time_visited is needed:

First set of queries:
Normal display of user:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;

Display of users profile:

SELECT /*SQL_CACHE*/ u.id, u.username, uv.last_time_visited FROM users AS u LEFT JOIN user_visited AS uv ON ( u.id = uv.id ) WHERE u.id = 1;

Second set of queries:
Normal display of user:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;

Display of users profile:

SELECT /*SQL_CACHE*/ id, username FROM users WHERE id = 1;
SELECT last_time_visited FROM user_visited WHERE id = 1;

In the first set, two different queries are used to fetch the same information, the username. This means the second query won’t be able to use the QC of the first and more frequent query. Also, the often-read table is joined with the often-written which means INSERT, DELETE and UPDATE queries on _both_ tables will clear cache for this query. In the second set of queries, the same query is used two times, increasing the chance of QC hit. And instead of joining the two tables in one query, a second and separate query is executed on the second table, here without the SQL_CACHE keyword. Two queries increases overhead, but the gain in higher cache hit (The first query can even get a hit in a Memcache-lookup) can easily outweigh this.