A networked key-value store
Go to file
Joseph Rothrock 459134f146 Updated Protocol
Improved the response protocol with a status field and
byte length of response payload.
2012-12-15 16:24:47 -08:00
LICENSE initializing project 2011-12-26 13:19:51 -08:00
Makefile Update the Makefile to correctly link against libpthread on Linux systems 2012-11-01 09:13:02 -07:00
README Updated Protocol 2012-12-15 16:24:47 -08:00
dbr_ctl Savepoint. keydb read code implemented but not tested 2012-01-08 13:57:24 -08:00
fnv.h Check first for key existence in write_record. 2012-01-01 11:36:29 -08:00
hash_32.c Check first for key existence in write_record. 2012-01-01 11:36:29 -08:00
longlong.h Check first for key existence in write_record. 2012-01-01 11:36:29 -08:00
roxanne.pdf More cleanup 2012-10-31 12:24:16 -07:00
roxanne_db.c Updated Protocol 2012-12-15 16:24:47 -08:00
roxanne_db.h Updated Protocol 2012-12-15 16:24:47 -08:00
status_codes.h Updated Protocol 2012-12-15 16:24:47 -08:00
test_data modularizing, implemented new protocol 2011-12-30 15:01:33 -08:00
tuple_bits.c Updated Protocol 2012-12-15 16:24:47 -08:00

README

To build the software:

$ make

To install it:

$ sudo make install 

This will put the executables in /usr/local/bin

To initialize or re-create the database files:

$ sudo /usr/local/bin/dbr_ctl initdb [force]

To start the db:

$ sudo /usr/local/bin/dbr_ctl start

Look at the top of the dbr_ctl script for some initialization variables you
 might want to change.

--------------
Precis
--------------
Roxanne is a very simple database server that allows a client to store and 
retrieve values by key.  Keys are stored in a hash map of 64K buckets. Hash
collisions are resolved by separate chaining onto linked lists at the end
of the index file. The default location for the index file is in
/var/roxanne/idx

In addition to value lookups by key, the database provides a way to group
keys in a hierarchical directory structure. See Composite Keys below.

The values for the given keys are stored in contiguous 4KB blocks in the 
database file (/var/roxanne/db). A file called block_bitmap tracks the 
free/busy blocks in the db file. The dbr processes memory-map this file for
very fast access. The db file starts out with zero blocks.  Blocks are only 
added to the db file as needed to accomodate new records. As typically
built, the database can accomodate about a billion blocks.

--------------
Composite Keys
--------------
The database supports the notion of a composite key. That is, a key
that is subdivided into a hierarchy that all keys participate in. The key-
space is similar to a typical Unix filesystem organized into directories.

To work with hierarchical keys, divide them with spaces ' '. The last
element becomes the value. The 'value' does not participate in the 
key-space.

The commands

create finance accounting payroll employees 50
create finance accounting receivables employees 70

Creates a 4-level hierarchy:
    
    1   2   3   4
    finance
        accounting
            payroll
                employees
            receivables
                employees
            
        
The key-space then becomes a kind of database on its own. A client
can query the database for all the subkeys of a path. This gives clients
building blocks for range queries and ordered (sorted) results.

Nodes in the key-space are reference-counted so that repeating groups
in the hierarchy don't waste space.

All lookups of values are still done via hashmap of the entire key. In 
other words, values can only be fetched by providing THE ENTIRE KEY.
In this case, the hash-lookup key is 'finance accounting payroll employees'.
It points to a starting block in the db that contains '50'.
 
This means that point-lookups of records will always be very fast, but that
the operator can use the key-space to organize and sort results.

Results from the keys command return in ascending sort order (strcmp).

--------------
Usage Example
--------------
madison:Roxanne rothrock$ sudo dbr_ctl start
Started listening.
carp:Roxanne rothrock$ telnet localhost 4080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
create alpha record_1
STATUS: OK
SIZE: 9
Write OK.

create alpha beta record_2
STATUS: OK
SIZE: 9
Write OK.

read alpha
STATUS: OK
SIZE: 8
record_1

read alpha beta
STATUS: OK
SIZE: 8
record_2

...Now a demonstration of the 'keys' command:

create alpha gamma record_3
STATUS: OK
SIZE: 9
Write OK.

create alpha delta record_4
STATUS: OK
SIZE: 9
Write OK.

keys alpha
STATUS: OK
SIZE: 16
beta delta gamma


--------------
Protocol
--------------

Data manipulation commands:

  create key [key]...value<newline>

  read key [key]...<newline>

  delete key [key]...<newline>

  keys [key] [key]....<newline>


Other commands:

  quit

Response:
"STATUS: <status string>\nSIZE: <integer>\n<integer> long string of bytes\n\n"

--------------
On Disk
--------------

The initial and maximum sizes are hard-coded in the application along
with the units of growth and storage. Eventually, these will become
configurable options. For now, it's just simpler this way.

Roxanne's persistent storage lives in /var/roxanne by default.

    $ du -skh /var/roxanne/*
    128M  /var/roxanne/block_bitmap
     64K  /var/roxanne/db
     64M  /var/roxanne/idx
     24K  /var/roxanne/keydb
      0B  /var/roxanne/keydb_freelist


### The Block Bitmap (/var/roxanne/block_bitmap)

Each _bit_ in this file corresponds to a 4 KB block in the 'db' file.
Roxanne processes memory-map this file and use it keep track of used
and free blocks. Only one process may update the block bitmap at any
time.  The block bitmap file provides a way to very quickly find and
reserve a contiguous set of blocks to store values.The block_bitmap
file is regularly flushed to disk with msync() after each reservation
request. The size of the block bitmap file is fixed at 128MB which is
enough space to keep track of 1,073,741,824 blocks. 

### The database (/var/roxanne/db)

All values are stored in the db file.  The unit of storage in the 'db'
file is the block, and  all blocks are 4KB. 

The db file starts out small and grows as more values are added to the
the database. All values inserted into the database are stored in
contiguous blocks. This simplifies the code, and provides for fast access,
but some fragmentation will occur over time if the database serves lots
of reads and writes of varying size.

#### The hash index (/var/roxanne/idx)

Initially sized at 64 megabytes, the hash index stores the full, composite
keys for values in the db file. An entry in the index is comprised of a
key stored as a string, an integer that represents a byte-offset in the
db file, an integer representing the length in bytes of the value, and
a third integer that represents a byte-offset in the idx file for
chaining additional keys. 

Hash collisions are resolved by the separate chaining method. Each
index entry is 1024 bytes, so the hash table has 65,536 slots. When a 
collision occurs, the colliding key is appended to the end if the index
file and the key in the slot where the collision occurred has its
'next' pointer updated to be the byte-offest in the index file where
the append occurred.

#### The keydb (/var/roxanne/keydb)

Initially sized at 0 bytes, the keydb stores the composite key hierarchy.
Each level of the hierarchy is stored as a binary tree.

Each node in the hierarchy has: a key-part, a left pointer (less-than),
a right pointer (greater-than), and a next pointer that points to the
next level in the hierarchy. 

Nodes have a reference count so that duplicate key-parts don't require
additional space. Unfortunately, the database does not yet support
reclamation of keydb nodes with a reference count of 0.

##### Example

Consider the following two composite keys (values left off):

foo bar toast
foo bar jam

This set of two composite keys comprises 4 nodes in the keydb, stored
like so:

key    refcount    next-pointer  left-pointer  right-pointer
------------------------------------------------------------
foo       2           bar           NULL          NULL
bar       2           toast         NULL          NULL
toast     1           NULL          jam           NULL
jam       1           NULL          NULL          NULL


Next, add these composite keys (again, values left off).
foo bar whiskey
zen
zoo
egg

The table now looks like this:

key    refcount    next-pointer  left-pointer  right-pointer
------------------------------------------------------------
foo       3           bar           egg           zen
bar       3           toast         NULL          NULL
toast     1           NULL          jam           whiskey
jam       1           NULL          NULL          NULL
whiskey   1           NULL          NULL          NULL
zen       1           NULL          NULL          zoo
zoo       1           NULL          NULL          NULL
egg       1           NULL          NULL          NULL


Here is the composite-key space represented graphically:

         +---+foo+----+
         |     +      |
         |     |      |
         |     |      |
         v     v      v
        egg   bar    zen+--+
               +           |
               |           v
               |           v
               v          zoo
         +--+toast+--+
         |           |
         |           |
         |           |
         v           v
        jam        whiskey