Tracking down a memory leak in Ferret 0.11.4

written by benjamin on July 29th, 2007 @ 12:44 PM

We recently discovered a unnatural growth of our mongrel servers. Starting with about 70megs of memory, each mongrel process will have more than 200megs just a couple of hours later. We first blamed rmagick for the leak, as there are some reports circulating about possible leaks in rmagick.

Gathering data about the memory usage of rails is quite simple, using bleak house, which uses a specially patched version of ruby. It can create nifty images about the object/memory usage of any ruby application. After some requests, we cannot find any memory leak in rmagick, but created this scary result, requesting several search pages.

the green line is the memory usage

omdb uses ferret on a lot of occasions. All edit movie dialogs are dependent on ferret, and even the ‘similar movies’ feature is just a complex ferret search statement. First of all we tried to isolate a search query, that will leak. Some of the more basic queries (like searching for people by name) were not affected. We focused on the movie-search, trying to find out, what causes the leak. Our first guess was, that some IndexReader or Writer was not closed properly, so old indexes still remains in the memory. The memory growth was quite huge, consuming almost 10megs every 10 requests. After some checking – and even rewriting our searcher class to purely rely on Ferret::Searcher, not on Ferret::Index::Index anymore – we couldn’t find any abandoned index.

So we took another look at the bleak house results. The number of objects in the search controller are consistent, there is no growth in the number of opened objects. The memory is jumping every 10 requests, but we fired our curl requests to just one action, so there is no reason, why the memory is growing every 10th request. Looking at the special memory report of bleak house, we found, that the memory usage is growing linearly.


We decided to remote all custom omdb ferret code and try to build the search using just the Ferret API. We added feature by feature, but long time, no leak. Just after we’ve added our custom analyzers, the leak appeared again. omdb uses a lot of different analyzers, not only per language, but per field. To use the right analyzer for each field/language, we’re using the PerFieldAnalyzer, which allows us to specify how we want each field to be analyzed. So the leak was not inside the IndexReader or Writer, but part of the analysis process. We managed to extract the problem to this simple script, that will consume lots and lots of memory, if you run it.

First we thought the MappingFilter is the problem, but it’s actually the PerFieldAnalyzer, that is leaking memory. If you just use StandardAnalyzers, the leak is marginal, but adding big character-mapping tables – maybe even to a lot of different fields – will result in the big memory consumption we’re experiencing.

The fix is trivial, the problem with the PerFieldAnalyzer is located in the C code of Ferret, so we just need to implement our own PerFieldAnalyzer, that is written in ruby. We’ve created a small Analyzer that will do the same as the build-in PerFieldAnalyzer.

Post a comment

Options:

Size