ruby/README.EXT

.\" README.EXT -  -*- Text -*- created at: Mon Aug  7 16:45:54 JST 1995

This document explains how to make extension libraries for Ruby.

1. Basic knowledge

In C, variables have types and data do not have types.  In contrast,
Ruby variables do not have static type and data themselves have
types.  So, data need to be converted across the languages.

Data in Ruby represented C type `VALUE'.  Each VALUE data have its
data-type.

To retrieve an C data from the VALUE, you need to:

 (1) Identify VALUE's data type
 (2) Convert VALUE into C data

Converting to wrong data type may cause serious problems.


1.1 Data-types

Ruby interpreter has data-types as below:

	T_NIL		nil
	T_OBJECT	ordinary object
	T_CLASS		class
	T_MODULE	module
	T_FLOAT		floating point number
	T_STRING	string
	T_REGEXP	regular expression
	T_ARRAY		array
	T_FIXNUM	Fixnum(31bit integer)
	T_HASH		associative array
	T_STRUCT	(Ruby) structure
	T_BIGNUM	multi precision integer
	T_TRUE		true
	T_FALSE		false
	T_DATA		data

Otherwise, there are several other types used internally:

	T_ICLASS
	T_MATCH
	T_VARMAP
	T_SCOPE
	T_NODE

Most of the types are represented by C structures.

1.2 Check Data Type of the VALUE

The macro TYPE() defined in ruby.h shows data-type of the VALUE.
TYPE() returns the constant number T_XXXX described above.  To handle
data-types, the code will be like:

  switch (TYPE(obj)) {
    case T_FIXNUM:
      /* process Fixnum */
      break;
    case T_STRING:
      /* process String */
      break;
    case T_ARRAY:
      /* process Array */
      break;
    default:
      /* raise exception */
      Fail("not valid value");
      break;
  }

There is the data-type check function.

  void Check_Type(VALUE value, int type)

It raises an exception, if the VALUE does not have the type specified.

There are faster check-macros for fixnums and nil.

  FIXNUM_P(obj)
  NIL_P(obj)

1.3 Convert VALUE into C data

The data for type T_NIL, T_FALSE, T_TRUE are nil, true, false
respectively.  They are singletons for the data type.

The T_FIXNUM data is the 31bit length fixed integer (63bit length on
some machines), which can be convert to the C integer by using
FIX2INT() macro.  There also be NUM2INT() which converts any Ruby
numbers into C integer.  The NUM2INT() macro includes type check, so
the exception will be raised if conversion failed.

Other data types have corresponding C structures, e.g. struct RArray
for T_ARRAY etc.  VALUE of the type which has corresponding structure
can be cast to retrieve the pointer to the struct.  The casting macro
RXXXX for each data type like RARRAY(obj).  see "ruby.h".

For example, `RSTRING(size)->len' is the way to get the size of the
Ruby String object.  The allocated region can be accessed by
`RSTRING(str)->ptr'.  For arrays, `RARRAY(ary)->len' and
`RARRAY(ary)->ptr' respectively.

Notice: Do not change the value of the structure directly, unless you
are responsible about the result.  It will be the cause of interesting
bugs.

1.4 Convert C data into VALUE

To convert C data to the values of Ruby:

  * FIXNUM

    left shift 1 bit, and turn on LSB.

  * Other pointer values

    cast to VALUE.

You can determine whether VALUE is pointer or not, by checking LSB.

Notice Ruby does not allow arbitrary pointer value to be VALUE.  They
should be pointers to the structures which Ruby knows.  The known
structures are defined in <ruby.h>.

To convert C numbers to Ruby value, use these macros.

  INT2FIX()	for integers within 31bits.
  INT2NUM()	for arbitrary sized integer.

INT2NUM() converts integers into Bignums, if it is out of FIXNUM
range, but bit slower.

1.5 Manipulate Ruby data

As I already told, it is not recommended to modify object's internal
structure.  To manipulate objects, use functions supplied by Ruby
interpreter.  Useful functions are listed below (not all):

 String functions

  rb_str_new(char *ptr, int len)

    Creates a new Ruby string.

  rb_str_new2(char *ptr)

    Creates a new Ruby string from C string.  This is equivalent to
    rb_str_new(ptr, strlen(ptr)).

  rb_str_cat(VALUE str, char *ptr, int len)

    Appends len bytes data from ptr to the Ruby string.

 Array functions

  rb_ary_new()

    Creates an array with no element.

  rb_ary_new2(int len)

    Creates an array with no element, with allocating internal buffer
    for len elements.

  rb_ary_new3(int n, ...)

    Creates an n-elements array from arguments.

  rb_ary_new4(int n, VALUE *elts)

    Creates an n-elements array from C array.

  rb_ary_push(VALUE ary, VALUE val)
  rb_ary_pop(VALUE ary)
  rb_ary_shift(VALUE ary)
  rb_ary_unshift(VALUE ary, VALUE val)
  rb_ary_entry(VALUE ary, int idx)

    Array operations.  The first argument to each functions must be an
    array.  They may dump core if other types given.

2. Extend Ruby with C

2.1 Add new features to Ruby

You can add new features (classes, methods, etc.) to the Ruby
interpreter.  Ruby provides the API to define things below:

 * Classes, Modules
 * Methods, Singleton Methods
 * Constants

2.1.1 Class/module definition

To define class or module, use functions below:

  VALUE rb_define_class(char *name, VALUE super)
  VALUE rb_define_module(char *name)

These functions return the newly created class or module.  You may
want to save this reference into the variable to use later.

2.1.2 Method/singleton method definition

To define methods or singleton methods, use functions below:

  void rb_define_method(VALUE klass, char *name,
		        VALUE (*func)(), int argc)

  void rb_define_singleton_method(VALUE object, char *name,
			          VALUE (*func)(), int argc)

The `argc' represents the number of the arguments to the C function,
which must be less than 17.  But I believe you don't need that much. :-)

If `argc' is negative, it specifies calling sequence, not number of
the arguments.

If argc is -1, the function will be called like:

  VALUE func(int argc, VALUE *argv, VALUE obj)

where argc is the actual number of arguments, argv is the C array of
the arguments, and obj is the receiver.

if argc is -2, the arguments are passed in Ruby array. The function
will be called like:

  VALUE func(VALUE obj, VALUE args)

where obj is the receiver, and args is the Ruby array containing
actual arguments.

There're two more functions to define method.  One is to define
private method:

  void rb_define_private_method(VALUE klass, char *name,
			        VALUE (*func)(), int argc)

The other is to define module function, which is private AND singleton
method of the module.  For example, sqrt is the module function
defined in Math module.  It can be call in the form like:

  Math.sqrt(4)

or

  include Math
  sqrt(4)

To define module function

  void rb_define_module_function(VALUE module, char *name,
				 VALUE (*func)(), int argc)

Oh, in addition, function-like method, which is private method defined
in Kernel module, can be defined using:

  void rb_define_global_function(char *name, VALUE (*func)(), int argc)


2.1.3 Constant definition

We have 2 functions to define constants:

  void rb_define_const(VALUE klass, char *name, VALUE val)
  void rb_define_global_const(char *name, VALUE val)

The former is to define constant under specified class/module.  The
latter is to define global constant.

2.2 Use Ruby features from C

There are several ways to invoke Ruby's features from C code.

2.2.1 Evaluate Ruby Program in String

Easiest way to call Ruby's function from C program is to evaluate the
string as Ruby program.  This function will do the job.

  VALUE rb_eval_string(char *str)

Evaluation is done under current context, thus current local variables
of the innermost method (which is defined by Ruby) can be accessed.

2.2.2 ID or Symbol

You can invoke methods directly, without parsing the string.  First I
need to explain about symbols (which data type is ID).  ID is the
integer number to represent Ruby's identifiers such as variable names.
It can be accessed from Ruby in the form like:

 :Identifier

You can get the symbol value from string within C code, by using

  rb_intern(char *name)

In addition, the symbols for one character operators (e.g +) is the
code for that character.

2.2.3 Invoke Ruby method from C

To invoke methods directly, you can use the function below

  VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)

This function invokes the method of the recv, which name is specified
by the symbol mid.

2.2.4 Accessing the variables and constants

You can access class variables, and instance variables using access
functions.  Also, global variables can be shared between both worlds.
There's no way to access Ruby's local variables.

The functions to access/modify instance variables are below:

  VALUE rb_ivar_get(VALUE obj, ID id)
  VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)

id must be the symbol, which can be retrieved by rb_intern().

To access the constants of the class/module:

  VALUE rb_const_get(VALUE obj, ID id)

See 2.1.3 for defining new constant.

3. Information sharing between Ruby and C

3.1 Ruby constant that C can be accessed from C

Following Ruby constants can be referred from C.

  Qtrue
  Qfalse

Boolean values.  Qfalse is false in the C also (i.e. 0).

  Qnil

Ruby nil in C scope.

3.2 Global variables shared between C and Ruby

Information can be shared between two worlds, using shared global
variables.  To define them, you can use functions listed below:

  void rb_define_variable(char *name, VALUE *var)

This function defines the variable which is shared by the both world.
The value of the global variable pointed by `var', can be accessed
through Ruby's global variable named `name'.

You can define read-only (from Ruby, of course) variable by the
function below.

  void rb_define_readonly_variable(char *name, VALUE *var)

You can defined hooked variables.  The accessor functions (getter and
setter) are called on access to the hooked variables.

  void rb_define_hooked_variable(char *name, VALUE *var,
				 VALUE (*getter)(), VALUE (*setter)())

If you need to supply either setter or getter, just supply 0 for the
hook you don't need.  If both hooks are 0, rb_define_hooked_variable()
works just like rb_define_variable().

  void rb_define_virtual_variable(char *name,
				  VALUE (*getter)(), VALUE (*setter)())

This function defines the Ruby global variable without corresponding C
variable.  The value of the variable will be set/get only by hooks.

The prototypes of the getter and setter functions are as following:

  (*getter)(ID id, void *data, struct global_entry* entry);
  (*setter)(VALUE val, ID id, void *data, struct global_entry* entry);

3.3 Encapsulate C data into Ruby object

To wrapping and objectify the C pointer as Ruby object (so called
DATA), use Data_Wrap_Struct().

  Data_Wrap_Struct(klass,mark,free,ptr)

Data_Wrap_Struct() returns a created DATA object.  The class argument
is the class for the DATA object.  The mark argument is the function
to mark Ruby objects pointed by this data.  The free argument is the
function to free the pointer allocation.  The functions, mark and
free, will be called from garbage collector.

You can allocate and wrap the structure in one step.

  Data_Make_Struct(klass, type, mark, free, sval)

This macro returns an allocated Data object, wrapping the pointer to
the structure, which is also allocated.  This macro works like:

  (sval = ALLOC(type), Data_Wrap_Struct(klass, mark, free, sval))

Arguments, klass, mark, free, works like their counterpart of
Data_Wrap_Struct().  The pointer to allocated structure will be
assigned to sval, which should be the pointer to the type specified.

To retrieve the C pointer from the Data object, use the macro
Data_Get_Struct().

  Data_Get_Struct(obj, type, sval)

The pointer to the structure will be assigned to the variable sval.

See example below for detail.

4. Example - Creating dbm extension

OK, here's the example to make extension library.  This is the
extension to access dbm.  The full source is included in ext/
directory in the Ruby's source tree.

(1) make the directory

  % mkdir ext/dbm

Make a directory for the extension library under ext directory.

(2) create MANIFEST file

  % cd ext/dbm
  % touch MANIFEST

There should be MANIFEST file in the directory for the extension
library.  Make empty file now.

(3) design the library

You need to design the library features, before making it.

(4) write C code.

You need to write C code for your extension library.  If your library
has only one source file, choosing ``LIBRARY.c'' as a file name is
preferred.  On the other hand, in case your library has plural source
files, avoid choosing ``LIBRARY.c'' for a file name.  It may conflict
with intermediate file ``LIBRARY.o'' on some platforms.

Ruby will execute the initializing function named ``Init_LIBRARY'' in
the library.  For example, ``Init_dbm()'' will be executed when loading
the library.

Here's the example of an initializing function.

--
Init_dbm()
{
    /* define DBM class */
    cDBM = rb_define_class("DBM", rb_cObject);
    /* DBM includes Enumerate module */
    rb_include_module(cDBM, rb_mEnumerable);

    /* DBM has class method open(): arguments are received as C array */
    rb_define_singleton_method(cDBM, "open", fdbm_s_open, -1);

    /* DBM instance method close(): no args */
    rb_define_method(cDBM, "close", fdbm_close, 0);
    /* DBM instance method []: 1 argument */
    rb_define_method(cDBM, "[]", fdbm_fetch, 1);
		:

}
--

The dbm extension wrap dbm struct in C world using Data_Make_Struct.

--
struct dbmdata {
    int  di_size;
    DBM *di_dbm;
};


obj = Data_Make_Struct(klass,struct dbmdata,0,free_dbm,dbmp);
--

This code wraps dbmdata structure into Ruby object.  We avoid wrapping
DBM* directly, because we want to cache size information.

To retrieve dbmdata structure from Ruby object, we define the macro below:

--
#define GetDBM(obj, dbmp) {\
    Data_Get_Struct(obj, struct dbmdata, dbmp);\
    if (dbmp->di_dbm == 0) closed_dbm();\
}
--

This sort of complicated macro do the retrieving and close check for
the DBM.

There are three kind of way to receiving method arguments.  First, the
methods with fixed number of arguments receives arguments like this:

--
static VALUE
fdbm_delete(obj, keystr)
    VALUE obj, keystr;
{
	:
}
--

The first argument of the C function is the self, the rest are the
arguments to the method.

Second, the methods with arbitrary number of arguments receives
arguments like this:

--
static VALUE
fdbm_s_open(argc, argv, klass)
    int argc;
    VALUE *argv;
    VALUE klass;
{
	:
    if (rb_scan_args(argc, argv, "11", &file, &vmode) == 1) {
	mode = 0666;		/* default value */
    }
	:
}
--

The first argument is the number of method arguments.  the second
argument is the C array of the method arguments.  And the third
argument is the receiver of the method.

You can use the function rb_scan_args() to check and retrieve the
arguments.  For example "11" means, the method requires at least one
argument, and at most receives two arguments.

The methods with arbitrary number of arguments can receives arguments
by Ruby's array, like this:

--
static VALUE
fdbm_indexes(obj, args)
    VALUE obj, args;
{
	:
}
--

The first argument is the receiver, the second one is the Ruby array
which contains the arguments to the method.

** Notice

GC should know about global variables which refers Ruby's objects, but
not exported to the Ruby world.  You need to protect them by

  void rb_global_variable(VALUE *var)

(5) prepare extconf.rb

If there exists the file named extconf.rb, it will be executed to
generate Makefile.  If not, compilation scheme try to generate
Makefile anyway.

The extconf.rb is the file to check compilation condition etc.  You
need to put

  require 'mkmf'

at the top of the file.  You can use the functions below to check the
condition.

  have_library(lib, func): check whether library containing function exists.
  have_func(func): check whether function exists
  have_header(header): check whether header file exists
  create_makefile(target): generate Makefile

The value of variables below will affect Makefile.

  $CFLAGS: included in CFLAGS make variable (such as -I)
  $LDFLAGS: included in LDFLAGS make variable (such as -L)

If compilation condition is not fulfilled, you do not call
``create_makefile''.  Makefile will not generated, compilation will
not be done.

(6) prepare depend (optional)

If the file named depend exists, Makefile will include that file to
check dependency.  You can make this file by invoking

 % gcc -MM *.c > depend

It's no harm.  Prepare it.

(7) put file names into MANIFEST (optional)

  % find * -type f -print > MANIFEST
  % vi MANIFEST

Append file names into MANIFEST.  The compilation scheme requires
MANIFEST only to be exist.  But, you'd better take this step to
distinguish required files.

(8) generate Makefile

Try generate Makefile by:

  ruby extconf.rb

You don't need this step, if you put extension library under ext
directory of the ruby source tree.  In that case, compilation of the
interpreter will do this step for you.

(9) make

Type

  make

to compile your extension.  You don't need this step neither, if you
put extension library under ext directory of the ruby source tree.

(9) debug

You may need to rb_debug the extension.  The extensions can be linked
statically by adding directory name in the ext/Setup file, so that you
can inspect the extension with the debugger.

(10) done, now you have the extension library

You can do anything you want with your library.  The author of Ruby
will not claim any restriction about your code depending Ruby API.
Feel free to use, modify, distribute or sell your program.

Appendix A. Ruby source files overview

ruby language core

  class.c
  error.c
  eval.c
  gc.c
  object.c
  parse.y
  variable.c

utility functions

  dln.c
  fnmatch.c
  glob.c
  regex.c
  st.c
  util.c

ruby interpreter implementation

  dmyext.c
  inits.c
  main.c
  ruby.c
  version.c

class library

  array.c
  bignum.c
  compar.c
  dir.c
  enum.c
  file.c
  hash.c
  io.c
  math.c
  numeric.c
  pack.c
  process.c
  random.c
  range.c
  re.c
  signal.c
  sprintf.c
  string.c
  struct.c
  time.c

Appendix B. Ruby extension API reference

** Types

 VALUE

The type for Ruby object.  Actual structures are defined in ruby.h,
such as struct RString, etc.  To refer the values in structures, use
casting macros like RSTRING(obj).

** Variables and constants

 Qnil

const: nil object

 Qtrue

const: true object(default true value)

 Qfalse

const: false object

** C pointer wrapping

 Data_Wrap_Struct(VALUE klass, void (*mark)(), void (*free)(), void *sval)

Wrap C pointer into Ruby object.  If object has references to other
Ruby object, they should be marked by using mark function during GC
process.  Otherwise, mark should be 0.  When this object is no longer
referred by anywhere, the pointer will be discarded by free function.

 Data_Make_Struct(klass, type, mark, free, sval)

This macro allocates memory using malloc(), assigns it to the variable
sval, and returns the DATA encapsulating the pointer to memory region.

 Data_Get_Struct(data, type, sval)

This macro retrieves the pointer value from DATA, and assigns it to
the variable sval.

** defining class/module

 VALUE rb_define_class(char *name, VALUE super)

Defines new Ruby class as subclass of super.

 VALUE rb_define_class_under(VALUE module, char *name, VALUE super)

Creates new Ruby class as subclass of super, under the module's
namespace.

 VALUE rb_define_module(char *name)

Defines new Ruby module.

 VALUE rb_define_module_under(VALUE module, char *name, VALUE super)

Defines new Ruby module, under the module's namespace.

 void rb_include_module(VALUE klass, VALUE module)

Includes module into class.  If class already includes it, just
ignore.

 void rb_extend_object(VALUE object, VALUE module)

Extend the object with module's attribute.

** Defining Global Variables

 void rb_define_variable(char *name, VALUE *var)

Defines a global variable which is shared between C and Ruby.  If name
contains the character which is not allowed to be part of the symbol,
it can't be seen from Ruby programs.

 void rb_define_readonly_variable(char *name, VALUE *var)

Defines a read-only global variable.  Works just like
rb_define_variable(), except defined variable is read-only.

 void rb_define_virtual_variable(char *name,
				VALUE (*getter)(), VALUE (*setter)())

Defines a virtual variable, whose behavior is defined by pair of C
functions.  The getter function is called when the variable is
referred. The setter function is called when the value is set to the
variable.  The prototype for getter/setter functions are:

	VALUE getter(ID id)
	void setter(VALUE val, ID id)

The getter function must return the value for the access.

 void rb_define_hooked_variable(char *name, VALUE *var,
				VALUE (*getter)(), VALUE (*setter)())

Defines hooked variable.  It's virtual variable with C variable.  The
getter is called as

	VALUE getter(ID id, VALUE *var)

returning new value.  The setter is called as

	void setter(VALUE val, ID id, VALUE *var)

GC requires to mark the C global variables which hold Ruby values.

 void rb_global_variable(VALUE *var)

Tells GC to protect these variables.

** Constant Definition

 void rb_define_const(VALUE klass, char *name, VALUE val)

Defines a new constant under the class/module.

 void rb_define_global_const(char *name, VALUE val)

Defines global constant.  This is just work as

     rb_define_const(cKernal, name, val)

** Method Definition

 rb_define_method(VALUE klass, char *name, VALUE (*func)(), int argc)

Defines a method for the class.  func is the function pointer.  argc
is the number of arguments.  if argc is -1, the function will receive
3 arguments argc, argv, and self.  if argc is -2, the function will
receive 2 arguments, self and args, where args is the Ruby array of
the method arguments.

 rb_define_private_method(VALUE klass, char *name, VALUE (*func)(), int argc)

Defines a private method for the class.  Arguments are same as
rb_define_method().

 rb_define_singleton_method(VALUE klass, char *name, VALUE (*func)(), int argc)

Defines a singleton method.  Arguments are same as rb_define_method().

 rb_scan_args(int argc, VALUE *argv, char *fmt, ...)

Retrieve argument from argc, argv.  The fmt is the format string for
the arguments, such as "12" for 1 non-optional argument, 2 optional
arguments.  If `*' appears at the end of fmt, it means the rest of
the arguments are assigned to corresponding variable, packed in
array.

** Invoking Ruby method

 VALUE rb_funcall(VALUE recv, ID mid, int narg, ...)

Invokes the method.  To retrieve mid from method name, use rb_intern().

 VALUE rb_funcall2(VALUE recv, ID mid, int argc, VALUE *argv)

Invokes method, passing arguments by array of values.

 VALUE rb_eval_string(char *str)

Compiles and executes the string as Ruby program.

 ID rb_intern(char *name)

Returns ID corresponding the name.

 char *rb_id2name(ID id)

Returns the name corresponding ID.

 char *rb_class2name(VALUE klass)

Returns the name of the class.

** Instance Variables

 VALUE rb_iv_get(VALUE obj, char *name)

Retrieve the value of the instance variable.  If the name is not
prefixed by `@', that variable shall be inaccessible from Ruby.

 VALUE rb_iv_set(VALUE obj, char *name, VALUE val)

Sets the value of the instance variable.

** Control Structure

 VALUE rb_iterate(VALUE (*func1)(), void *arg1, VALUE (*func2)(), void *arg2)

Calls the function func1, supplying func2 as the block.  func1 will be
called with the argument arg1.  func2 receives the value from yield as
the first argument, arg2 as the second argument.


 VALUE rb_yield(VALUE val)

Evaluates the block with value val.

 VALUE rb_rescue(VALUE (*func1)(), void *arg1, VALUE (*func2)(), void *arg2)

Calls the function func1, with arg1 as the argument.  If exception
occurs during func1, it calls func2 with arg2 as the argument.  The
return value of rb_rescue() is the return value from func1 if no
exception occurs, from func2 otherwise.

 VALUE rb_ensure(VALUE (*func1)(), void *arg1, void (*func2)(), void *arg2)

Calls the function func1 with arg1 as the argument, then calls func2
with arg2, whenever execution terminated.  The return value from
rb_ensure() is that of func1.

** Exceptions and Errors

 void rb_warn(char *fmt, ...)

Prints warning message according to the printf-like format.

 void rb_warning(char *fmt, ...)

Prints warning message according to the printf-like format, if
$VERBOSE is true.

 void rb_raise(VALUE exception, char *fmt, ...)

Raises an exception of class exception.  The fmt is the format string
just like printf().

 void rb_fatal(char *fmt, ...)

Raises fatal error, terminates the interpreter.  No exception handling
will be done for fatal error, but ensure blocks will be executed.

 void rb_bug(char *fmt, ...)

Terminates the interpreter immediately.  This function should be
called under the situation caused by the bug in the interpreter.  No
exception handling nor ensure execution will be done.

** Initialize and Starts the Interpreter

The embedding API are below (not needed for extension libraries):

 void ruby_init(int argc, char **argv, char **envp)

Initializes the interpreter.

 void ruby_run()

Starts execution of the interpreter.

 void ruby_script(char *name)

Specifies the name of the script ($0).

Appendix B. Functions Available in extconf.rb

These functions are available in extconf.rb:

 have_library(lib, func)

Checks whether library which contains specified function exists.
Returns true if the library exists.

 have_func(func)

Checks whether func exists.  Returns true if the function exists.  To
check functions in the additional library, you need to check that
library first using have_library().

 have_header(header)

Checks for the header files.  Returns true if the header file exists.

 create_makefile(target)

Generates the Makefile for the extension library.  If you don't invoke
this method, the compilation will not be done.

/*
 * Local variables:
 * fill-column: 70
 * end:
 */