external caching for coldfusion with memcached

What is memcached?

memcached is a lightweight cacheing program that allows you to store objects and items in an easily accessed format. The memcached server stores items in a binary format so that it's easy to get these items back. The memcached server is a very light-weight high-availabilty cache system. For you coldfusion heads out there, you can think of memcached as sort of a large struct in ram that holds your stores whatever you throw at it (with some exceptions). Access is quick and non-locking (so there's no waiting around to get stuff if you have several clients hitting it at one time). The server is a stand alone server that you can get to with a client library which zips info up and sends it to the server for you. Not only that, but the memcached server can be put up wherever you have spare ram that's sitting around unused. You can have several memcached servers sitting around and based on a specific algorithm, the memcached client will go out and fetch your stored items from it's servers, however many there are.

Wait... caching? coldfusion already has that... why bother?

True, coldfusion already has caching. it'll cache queries and it'll cache whatever you want in the application and server scopes. in fact you might not get much out of memcached unless you are trying to run a high-page hit site or are trying to cache a ton of data. For example, coldfusion runs on java under a java instance. there are limitations to the amount of memory you can safely use in coldfusion without running into problems.

These are underlying java limitations for the most part, because java intances can only have up to 2 gigs (if you are running cf 7) and i think 4 gigs if you are running cf 8 and remember, coldfusion takes up some of that ram for itself (about 100 mb). I had the opportunity of working at a company where their instance of coldfusion was having a hard time staying up because it was stuffed with too much cached information. The more info you have, the more coldfusion has to keep track of it, and the more your garbage collection has to run on that java instance (to clear out new space) and garbage collection running over 500 mb is faster than running across 2-4 gigs.

What memcached allows you to do is to offload that garbage collecting to a separate process and possibly into other servers if needed. It also allows you to off-load memory allocation costs to other machines. For example, if you have a large coldfusion machine with 10 gigs of ram and you are only running 1 instance of coldfusion, that's potentially 4-8 more gigs of ram that is sitting around unused.

Another nice thing about memcached is that if you have to restart coldfusion, you don't have to reload all those cached items. As long as the memcached server stays up, you can access those items.

Sure memcached sounds good, but aren't there some drawbacks?

Yup, there are some drawbacks to using memcached, like 1. using memcached takes a little more code to use than to store stuff in coldfusion. In coldfusion you can do this:

<cfquery name="mytestquery" datasource="mydatasource" cachedwithin="#createtimespan(1,0,0,0)#">
      .... query here
   </cfquery>

and you're done.. however, to use memcached, it takes a bit more code:

if ( variables.memcached.keyExists("mykey")   {
      mytestquery = variables.memcached.get("mykey");
   } else    {
      <cfquery name="mytestquery" datasource="mydatasource">
      .... query here
      </cfquery>   
      variables.memcached.store("mykey",mytestquery,3600);
   }

All in all, it's a little more code, but not a ton. It does get more involved if you are storing the query based off of parameters. Coldfusion automagically saves and updates stored queries based on the query parameters, whereas you have to manage that in memcached. It's not tragic, but its a bit more work. However, one benefit that you gain is that you can access the stored queries and purge the cached items whenever you want. You are able to do that in coldfusion as well, but you have more control over it with memcached.

That all sounds nice, where can i find it?

You can find the coldfusion memcached client here:

cfmemcached.riaforge.org

You can get the memcached server here:

www.danga.com/memcached

Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
David Stockton's Gravatar Nice work - I'm a huge fan of cacheing (too much sometimes!) and love memcached.

Thanks for sharing.

Dave
# Posted By David Stockton | 12/12/07 6:19 AM
George Bridgeman's Gravatar I've never looked at caching techniques in any great depth so memcached seems quite interesting.

However, with the example you've put in your post, surely that technique would only work for static data? After putting the query into memcached once, if the database changes you'll keep returning the cached (outdated) data. I've no doubt there are untold, complicated strategies for caching so it's probably a silly thing to bring up. Would you delete the cached items for an updated record when you commit the new data to the database?
# Posted By George Bridgeman | 12/18/07 3:22 AM
Jon Hirschi's Gravatar Actually, the example works with dynamic data as well, and I'd even go so far as to say that it works with dynamic data better than the built in coldfusion caching. When you do the built-in coldfusion caching, it's pretty easy and painless, the cache updates every x number of minutes, but if your data changes in between the refresh time period, the query that you've cached won't get updated.

so for example:


<cfquery name="blah" datasource="ds" cachedwithin="#createTimeSpan(0,1,0,0)#">
your query here
</cfquery>

Means that your query will be stored for 1 hour. unless you do some fancy checking, your query won't be run again for a whole day. Memcached works the same, but at the same time, different/better. for example, you can do this with memcached:


<cfif variables.memcached.keyExists("blah")>
<Cfset blah = variables.memcached.get("blah")>
<cfelse>
<cfquery name="blah" datasource="ds">
your query here
</cfquery>
<cfset variables.memcached.set("blah",blah,3600)>
</cfif>

Which does exactly the same thing. The 3600 means the amount of time in seconds that the item is supposed to be kept for. so in this example, the blah query would be kept in the memcached repository for 3600 seconds or 1 hour. after which, it is expired and the check to see if the key exists will fail, prompting the query to be rerun.

However, here's the cool part.. if you update the table which populates the query, you can poke the memcached to tell it that it needs to drop that specific key.. so something like this:


<cfquery name="blahupdate" datasource="ds">
update the table here
</cfquery>
<cfset variables.memcached.delete("blah">


That right there will delete the specific key that stored your cached query from before, which essentially gives you caching power with immediate updates. NICE!
# Posted By Jon Hirschi | 12/18/07 9:31 AM
Brian Buck's Gravatar One thing to consider is that the memcached instance is globally available making it similar to the Server scope in ColdFusion rather than a CF cached query which is bound to the application that it is created in (I believe). It is possible to have a security issue depending on what data you cache since that cached data is available to anybody who has access to the server.

Of course, that may not even be an issue because if you have access to install memcached on your server you probably aren't sharing it with anybody else.
# Posted By Brian Buck | 12/18/07 11:07 AM
Jon Hirschi's Gravatar Brian, That's a good point. in a shared situation, memcached doesn't work all that well, because there is no security on the cache. It doesn't check to make sure you have access to the keys, it just returns the results to you, whatever they are. In a shared situation, this is definatately not what you'd want.. but then if you are on a shared cf instance, you probably don't have the traffic to justify putting up a memcached instance. In a stand alone situtation, where you can put up the memcached instance behind a firewall, it makes a much better caching solution.
# Posted By Jon Hirschi | 12/18/07 11:15 AM
Brad Wood's Gravatar This is pretty cool. If used as an alternative to storing stuff in session, it could help reduce the overhead of session replication in a cluster of CF instances.
What are the types of data it can/can't store? Does it serialize the data?

What would be better than
<cfset variables.memcached.delete("blah")>
would be
<cfset variables.memcached.refresh_asynchronously("blah")>
which would allow for the cached data to be refreshed right then without a user having to wait for it.
Of course, the above code would assume the memcached object was aware of where the data came from.

If one were on CF8 they could fire off another thread to refresh the data. That way the user who updated the data would not need to wait, and the next user to come along and retrieve the data would have it there and waiting for them.

Also, you mention that it doesn't involve locking. What if it takes 5 seconds for the store method to run, and three gets run in the mean time. Will the keyExists() method only return true after store completes? If so, there would then be 4 requests all attempting to refresh the data. Are you assuming that it is the job of the application using memcache to handle whether or not the data needs refreshing/is already being refreshed? Generally I use a named lock and a second check once the lock has been entered to determine whether or not the data should be refreshed.

Thanks for sharing your innovation!

~Brad
# Posted By Brad Wood | 12/18/07 2:35 PM
Jon Hirschi's Gravatar Brad,
it can store pretty much all kinds of info. it does serialize the data. so for example, if you check out the memcached test page in the memcached client. what you can store from coldfusion is arrays, structs, and all kinds of simple values. In cf8 since cfcs are serializable you can now store cfcs. if you are running in cf7 or cf6, then you won't be able to store cfcs.

Another possibility is that you can store and retrieve whole page output as well for pages that don't change much. In the case of automatically doing a refresh, that's pretty much left up to the application to do. Memcached won't automatically refresh the data. I think it pretty much ascribes to the principle of doing one thing and doing it well.

On the subject of locking, yup, memcached doesn't lock. it doesn't lock period. if you need to do locking you have to do that in your application. This is actually a question thatcomes up a lot, so they added it to their faq. there are some good ideas there.

http://www.socialtext.net/memcached/index.cgi?faq#...

but yeah, all of the decision making belongs to the app. the app has to decide how to best handle refreshing the data.

-jon
# Posted By Jon Hirschi | 12/18/07 11:45 PM
Tom Chiverton's Gravatar The 4 gig limit only applies to 32-bit Java, if you are using 64-bit the limit is much (much !) higher.
# Posted By Tom Chiverton | 12/19/07 4:04 AM
Jon Hirschi's Gravatar Tom,

You're right. I looks like the max limit on 64 bit jvm on a 64 bit os is something aroudn 16 exobytes. though, i think you would probably run into garbage collecting issues if you tried to keep that much memory around... can you imagine how much time it would take a garbage collection process to go through all that?
# Posted By Jon Hirschi | 12/19/07 10:14 AM
JT's Gravatar Hi Jon,

I understand how this works for variables, but how would you go about using this to cache an entire page generated by ColdFusion?
# Posted By JT | 1/18/08 9:25 AM
Rob Brooks-Bilson's Gravatar Hi JT,

There are a couple of different ways you could do this with Jon's implementation. The easiest would be to modify the custom tag so that it could handle either variable caching (as it does now), or full/partial page caching - similar to how Ray Camden's ScopeCache tag works.

I have a rough version of that concept working on my laptop. Let me know if you are interested (you can email me through my website) and I'll send you a rough copy you can play with.
# Posted By Rob Brooks-Bilson | 1/18/08 11:21 AM
Patrick's Gravatar Hi,

am i missing something? I've been playing around with the memcachedtest.cfm and i don't see connections to the memcached server being terminated. Everytime i run the memcachedtest.cfm i get 1 more connection: curr_connections=46 etc.

This means in production i will run out of resources. How can i terminate the connections again or reuse the java object from different cfm files? Sorry, not been playing around with java a lot.
# Posted By Patrick | 2/20/08 10:04 AM
Jon Hirschi's Gravatar are you using the new memcached client or the old one? if you are using the new one, you need to un comment the call to the memcached.status() method. you should check and make sure that the memcached client is actually connecting to memcached. I have some fixes related to the timeout situation and putting in some more try catch stuff to fail gracefully. I'll post them soon. There is a problem with the underlying java client that if it is unable to get a connection to the memcached server, then it will keep trying, and it won't time out. This could be what is causing it to keep connections. The get and set functions shouldn't be affected by thing, but i haven't been able to get the status to include a timeout yet.

So there are a couple of solutions you can go for.

1. don't use the new memcached client instead opt for the old memcached client.

2. Or remove the status call from the memcached test.

I will post an update to the client by the weekend.
# Posted By Jon Hirschi | 2/21/08 11:29 AM
Jugsofbeer's Gravatar When trying out the code available for download, if anyone is trying to use multiple IP:port numbers for this,
all the coldfusion comments in the code say &quot;provide a comma delimited list of server ip's with ports&quot;
when after much frustration, it turns out that you should provide it as SPACE DELIMITED ...

eg &quot;10.33.104.1:11211 10.33.104.1:11212 10.33.104.1:11213&quot;
NOT &quot;10.33.104.1:11211,10.33.104.1:11212,10.33.104.1:11213&quot;

Once you get over that hurdle, if you kill one of the listed instances for some reason on the server and its listed still in the list of servers
(you should leave it listed or you screw up the hashes) you will get thrown an error from the Java Client.
The message it throws is
&quot; Object Instantiation Exception. An exception occurred when instantiating a Java object. The class must not be an interface or an abstract class. &quot;
And if you dig deep into the error exception code (via a cfdump of the error structure) you find this wonderful message.
&quot; Connection refused &quot;.

So basically if ONE memcache instance out of the list of servers+ports is offline the thing dies ever so un-gracefully and your left with egg on your face.

Why did I discover this issue ? Well I want to run multiple memcache servers on different machines in a production server farm, and sometimes we want to pull a server offline to rebuild
or repair so thatd mean kill all running software. Because we would have a LOT of data stored in memcache, when one memcache server offline I dont want to invalidate ALL data from all servers, I just want this code to gracefully notice one servers not working and start adding missing data as necessary to the next memcache instance. This is refered to as Consistant hashing. Ive coded this memcachednew to use Ketama algorith,.

Code :          variables.hashAlgorithm = variables.memcached.getHashAlgorithm().hash('KETAMA_HASH');
URL about ketama hashing : http://www.last.fm/user/RJ/journal/2007/04/10/rz_l...

Has anyone taken the &quot;memcacheNew&quot; version and evolved it further than the february release to something that could easily be used in a production environment on an existing large site ? And by chance solve the issues described above ?

The code is fantastic, dont get me wrong, its just a fraction off base for me and I suspect others to use properly. I know im not the only one who would try and utilise multiple memcahce servers...
# Posted By Jugsofbeer | 12/4/08 2:32 PM
Harry Klein's Gravatar Jon, I just modified the Memcached Client to work with CFMX and Railo. Please send me a mail if you are interested in the code.
# Posted By Harry Klein | 12/15/08 9:13 AM
Andrew's Gravatar I am using Railo (railo-3.0.1.000-railo-express-6.1.0-with-jre-windows) am getting no errors but it is not returning a struct (isstruct returns false) using the memcahedtest.cfm page, cfdump var shows a string with contents:
struct('somekey':'someval','someothersecondkey':'some other second val, 123456789','someotherkey':'someother val')
any suggestions?
# Posted By Andrew | 2/23/09 9:42 PM
Andrew's Gravatar I suspect it is a deserialise problem, here is a photo of the result:
http://www.macrodate.com/memcached.gif

Andrew
# Posted By Andrew | 2/24/09 10:03 AM
Andrew's Gravatar just to confirm that it was a deserilse problem and after talking to Harry Klein (above) it now works perfectly with Railo!
# Posted By Andrew | 3/30/09 2:01 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.7.