What’s in the Cage?
Cristoph Rupp’s hamsterdb is a lightweight, embedded database engine designed for ease of use, high performance, stability, and portability. In the database world, you have typically two extremes. On the high end, you have the full-featured and sometimes unwieldy Relational DBMS with SQL and a daemon/server process (such as Oracle). On the low end, there are b+tree-based systems, which are essentially just a database engine that is linked into the application and usually are without SQL support. As a lightweight database engine, hamsterdb fits into the latter category. It is very fast, but only supports the minimum needed operations. Specifically, it is embeddable, and therefore does not have the external dependencies or installation hassles of an SQL server. It is simply a database engine, but not a database management system (DBMS) and it has no relational functions or other features provided by SQL. For many apps that need to manage a lot of data, but don’t need an externally accessible database for report writers or 3rd party tools, hamsterdb may be your “pet” solution.
Of course, embedded systems such as cellphones and other portable devices, where memory is at a premium, also will benefit from the lightweight hamsterdb. It also supports in-memory databases, which may be helpful for these platforms as well.
Hamsterdb prides itself on fast algorithms and efficient data structures guarantee high performance for all scenarios. Specifically, the design minimizes redundant disk access and memory allocations. For example, it chooses memory-mapped file operations over the slower read/write I/O mechanism when possible. Hamsterdb has been around the block a few times: it has hundreds of unit tests with an impressive coverage rate of over 90%, which is executed on each target OS before release.
Because it’s written it generic ANSI-C, hamsterdb runs on many architectures including Intel/AMD, PowerPC, UltraSPARC, ARM, RISC, and others. Tested operating systems include Win32, Win64, Windows CE, Linux, and Solaris 9. The file formats are OS independent so you can read a file written in Solaris 9 on Windows and vice-versa. For object-oriented purists, there is indeed a ham::db class available. If you’re working in Windows but outside the C/C++ environment, you still can access Hamsterdb via .NET, Java, or Python wrappers.
Hamsterdb has your choice of two hassle-free licenses: GNU Public License 2 for non-commerical use or a close-source license for commercial use where you buy as many developer seats as you need.
Getting Started with Hamsterdb
Of course, the first order of business is downloading the package. After you untar it, you’ll see Visual Studio 2005 project files for building static and dynamic (DLL with import library) versions of Hamsterdb. Similar project files are also provided for a half-dozen or so sample programs. An additional set of project files provide buildable Windows CE library targets. Unlike a lot of open source projects, hamsterdb is completely self-contained, so there is no scavenger hunt to locate critical dependent libraries and tools.
The only glitch, if such could be said, was that the Doxygen generated HTML help files were not included. Fortunately, I am pretty familiar with Doxygen and used the following command to regenerate the docs:
cd hamsterdb-1.0.4 doxygen documentationdoxyfile
In this article, you’ll only have time to look at one example in-depth. As usual, it will be a simple example but explained thoroughly. The following test program demonstrates several basic features, such as the following:
- Creating the database
- Inserting data
- Looking up data
- Erasing data
Because it’s kind of longish, I’ll interweave code samples with explanation. If you want to look at the code unfettered by my commentary, that’s okay too!
1 #include <stdio.h> 2 #include <string.h> 3 #include <stdlib.h> /* for exit() */ 4 #include <ham/hamsterdb.h> 5 6 #define LOOP 10 7 8 int main(int argc, char **argv) 9 { 10 int i; 11 ham_status_t st; /* status variable */ 12 ham_db_t *db; /* hamsterdb database object */ 13 ham_key_t key; /* the structure for a key */ 14 ham_record_t record; /* the structure for a record */ 15 16 memset(&key, 0, sizeof(key)); 17 memset(&record, 0, sizeof(record)); 18 19 st=ham_new(&db); 20 if (st!=HAM_SUCCESS) 21 error("ham_new", st); 22 23 st=ham_create(db, "test.db", 0, 0664); 24 if (st!=HAM_SUCCESS) 25 error("ham_create", st);
First, let me note that I’ve removed the #ifdefs for Windows CE in this example because they don’t contribute to functional issues. The first real API call, ham_new() in line #20, gets you a db object you’ll continue to use until you call ham_delete(). In line #23, you create the actual database as the second parameter, “test.db”. If you wanted to use an in-memory database, you would simply pass in null here. The third parameter is a set of flags that you can use to tune performance. I’ll only mention a few of the more interesting choices in passing:
- HAM_WRITE_THROUGH: Immediately write modified pages to the disk. This slows down all database operations, but may provide integrity in case of a crash.
- HAM_IN_MEMORY_DB: No file will be created, and the database contents are lost after the database is closed.
- HAM_RECORD_NUMBER: Creates an “auto-increment” database.
- HAM_ENABLE_DUPLICATES: Enable duplicate keys for this Database. By default, duplicate keys are disabled.
- HAM_LOCK_EXCLUSIVE: Place an exclusive lock on the file. Only one process may hold an exclusive lock for a given file at a given time.
- HAM_ENABLE_TRANSACTIONS: Enables Transactions for this database.
Even more goodies are available through ham_new_ex(). These let you tune the cache size, page size, and B+tree index key size.
26 27 for (i=0; i<LOOP; i++) { //demonstrate insert functions 28 key.size=sizeof(i); 29 key.data=&i; 30 record.size=sizeof(i); 31 record.data=&i; 32 st=ham_insert(db, 0, &key, &record, 0); 33 if (st!=HAM_SUCCESS) 34 error("ham_insert", st); 35 } 36
In the next section (lines 26-36), you simply insert about 10 records with the datavalues 1 thru 10. The ham_insert() used on line #32 is the standard method of getting data into the database. The second parameter is a transaction handle (or null if you don’t care to use transactions). The next parameter is the primary key you will associate with the record data. If you open the database with the HAM_RECORD_NUMBER flag, the system will generate this for you. The fourth parameter is nothing but a pointer to the data you’re going to insert (note the size was set on line #30). Last, a set of insertion flags that can be HAM_OVERWRITE to replace a record with a matching key or HAM_DUPLICATE to force an additional record if the key already existed.
37 for (i=0; i<LOOP; i++) { // retrieve the data 38 key.size=sizeof(i); 39 key.data=&i; 40 41 st=ham_find(db, 0, &key, &record, 0); 42 if (st!=HAM_SUCCESS) 43 error("ham_find", st); 44 45 if (*(int *)record.data!=i) { 46 printf("ham_find() ok, but returned bad valuen"); 47 return (-1); 48 } 49 }