Memory issue with c# and get_CellValue

Jul 21, 2011 at 10:58 AM

I have a C# .NET App that reads in large shapefiles (millions of features) and I am having an issue with memory that seems to be consumed by the MapWinGis.Shapefile.get_CellValue method call.

I have boiled it down to a simple loop like this:

 

 

 

 

 

Shapefile shapeFile = Utils.LoadShapefile("c:\\Dotted Eyes\\AddressView Plus\\Regional\\bs7666.shx");

 

for (int shapeIdx = 0; shapeIdx < shapeFile.NumShapes; shapeIdx++)

 

{

shapeFile.get_CellValue(1, shapeIdx); }

If I run this, the memory ramps up and up until I eventually get an OutOfMemory exception, even though I am doing nothing with the return value.

I suspect this is something to do with the marshalling of the return value from C++ to C#, and the fact that I can't free the memory that is sent to me from c++.

How do I use this method call without consuming memory. All I want to do is read the cell value and then dispose of the return value.

 

Many thanks,

Simon Clough


Developer
Jul 21, 2011 at 3:42 PM

Simon, 

how large is dbf table of you shapefile? 

I used the code below for a shapefile with ~ 3000 shapes and memory usage was steady (notice the outer cycle with 10000 repetitions). 

When cell value is requested MapWinGIS checks whether the specified row was loaded in the memory, if it wasn't - it loads it (the whole row for a single cell).

So it seems that you are running out of the memory just because your dbf is too large to store it in memory.

        private void button7_Click(object sender, EventArgs e)
        {
            MapWinGIS.Shapefile sf = new Shapefile();
            OpenFileDialog dlg = new OpenFileDialog();
            dlg.Filter = sf.CdlgFilter;
            if (dlg.ShowDialog() == DialogResult.OK)
            {
                sf.Open(dlg.FileName, null);

                for (int j = 0; j < 10000; j++)
                {
                    string s = "";
                    for (int i = 0; i < sf.NumShapes; i++)
                    {
                        s += sf.get_CellValue(0, i).ToString();
                    }
                    System.Diagnostics.Debug.Print("Iteration: {0}; Val = {1}", j, s);
                }
                sf.Close();
            }
        }

Regards,

Sergei

 


 

Jul 21, 2011 at 4:14 PM

Hi Sergei,

Thanks for the reply. The shapefile has 1500000 (1.5 million) shapes in it.

I've tried modifying my code to do the same sort of thing as you:

 

            MapWinGIS.Shapefile shapeFile = Utils.LoadShapefile(fileName);
            for (int J = 1; J < 10000; J++)
            {
                for (int shapeIdx = 0; shapeIdx < 3000; shapeIdx++)
                {
                    textBox1.Text = shapeFile.get_CellValue(0, shapeIdx).ToString();
                }
            }
            shapeFile.Close();

 What I find is that the number of the iterations in the outer loop has no effect on the memory usage, it is only the number of iterations in the inner loop, ie the number of shapes read from the file.

Setting the inner loop max to 100,000 cause this loop to consume 250MB RAM.

I don't believe I had this issue with a different version of MapWinGiS Active X, but I've tried all the versions that we have used (4.7 through to 4.8RC2) and cannot get it to work.

 

Jul 21, 2011 at 4:30 PM

What I have now discovered is that if I open and close the shape file every so often, this frees up all of the memory and doesn't cause the huge ramp in memory (ie I get a sawtooth kind of memory usage profile).

This isn't really a practical solution, but it does seem to indicate that the issue is to do with the reading of the shapefile values, and them not being freed until the shapefile is closed.

 

Developer
Jul 21, 2011 at 6:33 PM

Simon,

< but it does seem to indicate that the issue is to do with the reading of the shapefile values, and them not being freed until the shapefile is closed

Agreed. Values are cached like I said.

Just to rule out any doubts, what is the size of the dbf file you are using in megabytes? According to my testing in memory representation can require about 3 times

more space than disk one (majorly because of inoptimalities of VARIANT data type I believe).

It seems that optimization of Table class will be needed to solve your problems. I'm considering the property Table.CachingBehavior with enumerated values:

CachingBehavior

{

NoCaching = 0,     // values are read directly from the disk, no memory is used

RowCaching = 1,   // the one we have now, should be default one

// optional CellCaching = 2,   // only the values that were actually requested are cached (not the whole row)

}

I can't tell when I have time to implement it, though. 

Regards,

Sergei

Jul 22, 2011 at 7:13 AM

Hi Sergei,

This particular shapefile is 108Mbyte.

I have some that are 1.9Gb. This is UK wide address / mapping data so they are large files.

It would be great if some options to control the caching could be added. All I am personnaly interested in is reading forward through the file, so the No Caching option would suit me, but I can see the benefits of the other options too.

At this stage in development I can go with the workaround of closing the file, and implement the proper fix when you have had chance to look at it.

Thanks for all your help, very much appreciated.

Cheers,

Simon