The dRuby Book

11.1 Dealing with GC

While a script is running, there is a point when certain objects become not referenced anywhere. Once the object isn’t referenced, then you aren’t able to use it anymore. It is the responsibility of GC to clean up such unused objects. The Ruby interpreter looks up its own process object space from time to time and then cleans up unused objects to free up more memory space.

dRuby and GC

If an object is referenced anywhere, then the object must be in use. Normal Ruby never cleans up such objects, but it is a different story in dRuby.

DRbObject refers to an object in a different process, but the Ruby interpreter doesn’t know about the existence of the other process, which causes some problems. Even when DRbObject retains a reference, if the object isn’t referenced within its own process, then the object becomes the target of GC. Once an object becomes collected by GC, method calls to these objects are no longer guaranteed. You can’t expect when CG will clean up objects, because the timing of GC cleanup depends on the Ruby interpreter, and accessing nonexistent objects will raise errors.

This is not necessarily a Ruby problem, but you do need to be prepared to avoid the problem. In this chapter, we’ll look into how to deal with GC when using dRuby.

GC Workaround at the Application Level

The best way to handle GC is to create a workaround at the application level. Here are the general strategies:

front objects and the return values of each method are examples of the preceding cases, so you have to be extra careful with them.

Some other long-running objects, such as global variables, class variables, and singletons, tend not to have these problems. They tend to live long and are less likely to be garbage collected, so you don’t have to worry about them too much.

You may need some tricks when you are dealing with temporary objects. In most scenarios, you can simply pass by value, but there are times you have to pass by reference.

It’s a common strategy to pass temporary objects via an iterator (block). Examples are open class methods in the File class or transactions in PStore.

  ​File.open('foo.txt') do |fp|​
  ​ fp.gets​
  ​ ...​
  ​end​

When you call File.open class methods with a block, you can pass a File object only while it is open. When it reaches the end of the block, then the object is closed automatically.

File.open is a convenient method, but notice in the sequential diagram that it’s referencing the File object temporarily (see Figure 46, Temporary File object stays referenced during an open block) while in a block.

images/d2gcyield.png

Figure 46. Temporary File object stays referenced during an open block

The outer open method block wraps a File object, so this object is referenced within the stack of the memory space and won’t become the target of garbage collection. You can use the temporal object without worrying about being garbage collected as long as you refer to it inside the open method block.

Next, a small class opens a file as read-only.

  ​class DCP​
  ​ def open(fname, &block)​
  ​ File.open(fname, 'rb', &block)​
  ​ end​
  ​end​

This strategy works not only to protect against GC but also to create a context inside the block and do some chunked operations. The next example is a class to transfer files. The DCP class defines both sending and receiving files.

dcp.rb
  ​require 'drb/drb'
  ​​
  class DCP​
  ​ include DRbUndumped​
  ​​
  def size(fname)​
  ​ File.lstat(fname).size​
  end
  ​​
①  def fetch(fname) ​
  ​ File.open(fname, 'rb') do |fp|​
  while buf = fp.read(4096)​
  yield(buf)​
  end
  end
  ​ nil​
  end
  ​​
②  def store_from(there, fname) ​
  ​ size = there.size(fname)​
  ​ wrote = 0​
  ​​
  ​ File.open(fname, 'wb') do |fp|​
  ​ there.fetch(fname) do |buf|​
  ​ wrote += fp.write(buf)​
  yield([wrote, size]) if block_given?​
  ​ nil​
  end
  end
  ​ wrote​
  end
  ​​
  def copy(uri, fname)​
  ​ there = DRbObject.new_with_uri(uri)​
  ​ store_from(there, fname) do |wrote, size|​
  ​ puts "#{wrote * 100 / size}%"
  end
  end
  end
  ​​
  if __FILE__ == $0​
  if ARGV[0] == '-server'
  ​ ARGV.shift​
  ​ DRb.start_service(ARGV.shift, DCP.new)​
  ​ puts DRb.uri​
  ​ DRb.thread.join​
  else
  ​ uri = ARGV.shift​
  ​ fname = ARGV.shift​
  ​ raise('usage: dcp.rb URI filename') if uri.nil? || fname.nil?​
  ​ DRb.start_service​
  ​ DCP.new.copy(uri, fname)​
  end
  end

The fetch method (①) and the store_from method (②) are the keys for the file transfer. They split large files into multiple parts and transfer them one by one, because reading a big file all at once will take up too much memory space. The fetch method yields a buffer of split files. The File object does multiple operations inside the fetch method.

To test the preceding example, run the script on one terminal with a -server option.

  ​% mkdir s​
  ​% cd s​
  ​# This is a UNIX command to create a file with 10k size.​
  ​% mkfile 10k bigfile.txt​
  ​% ruby ../dcp.rb -server​
  ​druby://localhost:12345​

Then run the same script in another terminal. This time, add the URI and filename as options.

  ​% mkdir c​
  ​% cd c​
  ​% ruby ../dcp.rb druby://localhost:12345 bigfile.txt​
  ​ 40%​
  ​ 80%​
  ​ 100%​

You should see that a file is copied from server to client in a chunk. Next, we’ll look into how to automatically avoid having objects be garbage collected.