Copyright ©
Abridged Kdb+ Database Manual

Arthur Whitney

1 Summary

Kdb+ is an RDBMS and a general purpose programming language:

* OLTP from 100 thousand to 1 million records per second per cpu.

* OLAP from 1 million to 100 million records per second per cpu.

* Solaris, Linux and Windows. 32bit and 64bit.

trade:([]time:`time$();sym:`symbol$();price:`float$();size:`int$()) / create `trade insert(12:34:56.789;`xx;93.5;300) / insert select sum size by sym from trade / select

The realtime OLTP+OLAP part of the database is generally in memory with an optional update log. There is no limit to the size of the OLAP historical databases on disk. Typical customers run realtime analysis on 100 million transactions per day and historical analysis on the accumulated billions of records.

2 Install

Unzip kdb+ (and k4.lic ) in QHOME (default: q ).

SOL64> unzip -d q (q/s64/q) LIN64> unzip -d q (q/l64/q) WIN32> w32.exe (WINDIR/q.exe)

Executable is q . Put QHOME/{s64|l64} on PATH.

3 Server

>q [f] [-p p] [-s s] [-t t] [-T T] [-o o] [-u u] [-w w] [-r r] [-l] f load script (*.q), file or directory -p port kdbc(/jdbc/odbc) http(html xml txt csv) -s slaves for parallel execution -t timer milliseconds -T timeout seconds -o offset hours(from GMT: affects .z.Z) -u usr:pwd file (-U allows file operations) -w workspace MB limit -r replicate from :host:port -l log updates to filesystem (-L sync) -z [0] "D"$ uses mm/dd/yyyy or dd/mm/yyyy -P [7] printdigits(0:all) -W [2] week offset(0:sat)

The simplest database is just a process which may be initialized with a script. Each process handles any number of concurrent http and kdbc (jdbc/odbc) clients. For example, start a trade.q database listening on port 5001.

>q trade.q -p 5001

View the database with a browser: http://localhost:5001

4 Client

Kdb+ returns text (html, txt, csv or xml) to http queries.

browser http://localhost:5001/?select sum size by sym from trade browser/excel http://localhost:5001/q.csv?select sum size by sym from trade excel/rtd excel (data/getexternaldata/newwebquery) http://localhost:5001/q.txt?select sum size by sym from trade console (q/l32/qcon localhost:5001) localhost:5001>select sum size by sym from trade perl (use LWP::Simple;) get("http://localhost:5001/.txt?select sum size by sym from trade")

Kdb+ returns data to java, .net, c, jdbc/odbc or q clients. The results are atom, list, dict or table.

Result=k(string | {function,args}) java (q/c/ import java.sql.*; try{c c=new c("localhost",5001); // connect Object[]x={new Time(System.currentTimeMillis()%86400000),"xx",new Double(93.5),new Integer(300)}; c.k("insert","trade",x); // insert Object r=c.k("select sum size by sym from trade"); // select }catch(Exception e){} c (q/c/c.c) #include"k.h" int main(){K x;int c=khp("localhost",5001);if(c==-1)return c; // connect x=knk(4,kt(1000*(time(0)%86400)),ks("xx"),kf(93.5),ki(300)); k(-c,"insert",ks("trade"),x,0); // insert x=k(c,"select sum size by sym from trade",0);r0(x); // select return 0;} q c:hopen`:localhost:5001 / connect c("insert";`trade;("t"$.z.Z;`xx;93.5;300)) / insert c"select sum size by sym from trade" / select

Data is faster and more flexible than text. You should get 100,000 single record selects, inserts, updates [and deletes] per second. You can insert or update a million records per second in bulk. Kdb+ is generally faster than the network.

5 Datatypes

t| size literal q sql java .net xmlschema --------------------------------------------------------------------- b 1 0b boolean Boolean boolean boolean x 1 0x0 byte Byte byte byte h 2 0h short smallint Short int16 short i 4 0 int int Integer int32 int j 8 0j long bigint Long int64 long e 4 0e real real Float single single f 8 0.0 float float Double double double c 1 " " char Character char s . ` symbol varchar String string string m 4 2000.01m month d 4 2000.01.01 date date Date date z 8 dateTtime datetime timestamp Timestamp DateTime dateTime u 4 00:00 minute v 4 00:00:00 second t 4 00:00:00.000 time time Time TimeSpan time * 4 `s$` enum

All but boolean and byte have nulls. The int, float, char and symbol literal nulls are: 0N 0n " " ` . The rest use type extensions, e.g. 0Nd . Each type also a list notation, e.g. in the following nested list:

(01b;0x00ff;0 1h;0 1;0 1j;0 1e;0 1.0;"ab";`a`b;2000.01 2000.02m;2000.01.01 2000.01.02;..)

There must be no spaces in symbol lists, e.g. `a`b`c . Symbols can have any non-zero characters (e.g. `$"a-b") but identifiers must be alphanumeric.

6 Insert and Upsert

The q

`t insert ..

is similar to the sql

insert into t ..

It will check primary key violations. The q

`t upsert .. (also t,:..)

is insert for tables and upsert (update existing and insert new) for keyed tables.

7 Select and Update

q is an extension of sql capabilities. Expressions are short and simple.

select vwap:size wavg price by sym from trade / volume weighted average price select from trade where sym=`IBM, 0<deltas price / price went up select sum qty by order.customer.nation from item / rollup of 4-way join

These all run at several million records per second. In general:

select [a] [by b] from t [where c] update [a] [by b] from t [where c]

These are similar to (but more powerful than) the sql

select [b,] [a] from t [where c] [group by b order by b] update t set [a] [where c]

In sql the where and group clauses are atomic and the select and update clauses are atomic or aggregate if grouping. In q the where and by clauses are uniform and the select and update clauses are uniform or aggregate if grouping ( by ). All clauses execute on the columns and therefore q can take advantage of order. Sql can't tell the difference. Sql repeats the group by expressions in the select and the where clause is one boolean expression. The q where clause is a cascading list of constraints which nicely obviates some complex sql correlated subqueries and also gets rid of some parentheses. q relational queries are generally half the size of the corresponding sql. Ordered and functional queries do things that are difficult in sql. See for examples. i is a special token that indicates record handle.

8 Default Column Names

Create, select and update will create names from expressions if there is no assignment. The name is last token (first token if operation is +, -, *, %, & or |) in the expression or x if this token isn't a valid name, e.g.

select count i,sum qty by order.customer.nation ..

is short for

select x:count i,qty:sum qty by nation:order.customer.nation ..

9 Parameterized Queries

String[]x={"IBM","MSFT"}; r=c.k("{[s]select last price by sym from trade where sym in s}",x);

10 Table Arithmetic

t:([x:`a`b]y:2 3) u:([x:`b`c]y:4 5) t+u / ([x:`a`b`c]y:2 7 5)

It is common to want to do arithmetic with tables. Extending arithmetic to tables replaces complicated sql coalescing and outer joins.

11 Foreign Keys

Foreign keys are useful in q, e.g. given

s:([s]name;city;..) p:([p]name;city;..) sp:([]s:`s$();p:`p$();..) / foreign key definitions


select .. from sp where / supplier city equal part city update .. from sp where / try that in sql

See for more examples.

12 Joins

Joins generally run at 10 to 100 million records per second.

See for an 8-way join

13 Q from SQL

Following is a map of how to do sql things in q. There is no reverse map since q is a much richer language than sql. With respect to CJDate "The SQL Standard"

s)create table s(s varchar(*)primary key,name varchar(*),status int,city varchar(*)) q)s:([s:`symbol$()]name:`symbol$();status:`int$();city:`symbol$()) s)create table sp(s varchar(*)references s,p varchar(*)references p,qty int) q)sp:([]s:`s$();p:`p$();qty:`int$()) s)insert into s values('s1','smith',20,'london') q)`s insert(`s1;`smith;20;`london) s)update s set status=status+1 where s='s1' q)update status:status+1 from`s where s=`s1 s)delete from s where s='s1' q)delete from`s where s=`s1 s)select * from s where s='s1' q)select from s where s=`s1 s)select s,count(*)as x from sp group by s order by s q)select count i by s from sp s)select s,sum(qty)as qty from sp,s,p where sp.s=s.s and sp.p=p.p and group s order s q)select sum qty by s from sp where s)select distinct .. q)select distinct .. s)count sum min max avg q)count sum min max avg prd first last wavg wsum .. s)+ - * / < > = <= >= <> like between-and not and or q)+ - * % < > = <= >= <> like within not & | ..

14 Stored Procedures

Server functions and procedures are written in q. Why not java? q is faster, smaller and better than java. For example, a server function to subtract a client table p from a server table P is f:{[p]P-:p}

15 Rename, Rearrange

xcol [p]rename `a`b xcol sp xcols [p]rearrange `p`s xcols sp

16 Correlated Subquery

(aggr;exp)fby exp

obviates common correlated subqueries, e.g.

select from sp where qty>(avg;qty)fby p select from sp where s=`s1,qty>(avg;qty)fby p

see also

17 Performance

Choose layout and optimizations to fit the important queries:

`s sorted `u unique `p parted `g grouped don't bother with any on fewer than one million rows. don't bother with `p on fewer than one billion rows.

think about the queries. think about disk. e.g. if you want fast access on terabytes by time and customer store it both ways.

18 Layout

Tables are:

size| small medium large -------------------------------- 32bit 1GB 32M rows unlimited 64bit 10GB 512M rows unlimited store file directory partitions query .01ms .3ms 10ms(seek) tick taq

large are usually sorted and parted so that important queries are one seek.

64bit can generally handle realtime tables around 10 times bigger than 32bit.

64bit has an advantage on medium (and large) because queries on 32bit are dominated by mapping.

19 Small

One day's worth of tick(trades, quotes, orders, ..) is a few GB and fits in RAM.

Queries are around 10 microseconds.

20 Medium

t `s#date,`g#mas, ..

dayend append (and `g#mas) is fast.

once data is cached in memory

\t select i from t where date=.. \t select i from t where mas=..

Queries are around 1 millisecond.

21 Large

trade `p#sym quote `p#sym

Queries run from disk at 10ms per date/sym/field.

The partition field is date month year or int and is virtual.

Kdb+taq is more than 2500 partitions of daily trades and quotes. Each new day is more than 1GB. [Queries on partitioned databases can run in parallel.] To set a subview -- handy for test and development:

.Q.view 2#date / view two first partitions .Q.view[] / reset

22 Limits

Each database runs in memory and/or disk map-on-demand -- possibly partitioned. There is no limit on the size of a partitioned database but on 32-bit systems the main memory OLTP portion of a database is limited to about 1GB of raw data, i.e. 1/4 of the address space. The raw data of a main memory 64bit process should be limited to about 1/2 of available RAM.

23 Partition

A kdb+ historical database is a file system directory. Small tables are simply binary files inside this directory. Medium-sized tables (<100 million rows) are splayed (a file for each column) across a subdirectory. Big tables (billions of rows) are horizontally partitioned (often by date) and then splayed across a directory. An example layout for kdb+taq is:

/data/db/ sym /enumeration of symbols mas /master table 2005.03.15/ trade/time .. quote/time .. 2005.03.16/ trade/time .. quote/time .. ..

Kdb+ databases run 24*7. After a new date is added a simple reset message is sent to the server. Small tables are in memory. The big table columns are mapped in and out on demand. To avoid swapping RAM should be at least 10* size of the largest column, e.g. if one table in one partition can have 100 million rows there should be 8GB (10*8(byte float)*100,000,000) of RAM. [Address usage is 20* size of the largest column so on 32bit systems tables shouldn't have more than 16.7 million rows.] The number of partitions, tables and columns doesn't matter. Some customers have 1000's of partitions. Some have 1000's of tables. Some have 1000's of columns in the tables.

Tables with many columns are good splayed table candidates because the bulk of these tables will always be out of memory (queries rarely require access to more than a few columns at a time). The performance characteristics of splayed tables are independent of the number of columns in the table. They depend only on the number of columns that must be in memory at any one time, which in turn depends on the specific queries being executed. Columns of Kdb+ splayed tables are stored contiguously on disk and laid out contiguously in real memory when accessed. This organization gives optimal disk access performance. The operating system caches accessed columns in real memory. Consequently, if the real memory hit-rate is high, the performance is real memory performance. (This is often the case because many database applications have a few heavily used queries that reference the same few columns). There can be one or many instances of the database. All instances point to the same data. All processes will reside on the server including a gateway should one be required. The system can be implemented with or without a gateway process. In a system without a gateway, users connect directly to a database instance and send queries, with a gateway the users connect and send queries to the gateway process which allocates the query based on which instance is free. This can be handy for running more than one query at a time but overall throughput can of course suffer to the extent that the queries are competing for disk. But this is a way to let an important user get something quick while some bigger analysis is going on in another instance.

24 Parallel

Partitioned databases can run in parallel. But if we run in parallel it is best to have parallel i/o. Each kdb+ process can easily consume 100MB per second. Therefore the fastest multi-terabyte kdb+ databases use parallel i/o with guaranteed no contention. Customers do this with DAS (direct attached storage). U320 scsi controllers are fast and inexpensive. Customers sometimes use SAN (200MB/sec total) or NAS (80MB/sec total) because the IT departments insist on it. That is ok. But this is the limiting factor. If you use NAS be sure to have at least 1Gbit ethernet.

If you use NAS there is no point in going parallel -- i.e. one kdb+ process can easily consume everything the filesystem throws at it. 4 slaves will saturate SAN. With DAS (direct attached storage) we can have one slave per disk array and ideally at least 2 disk arrays per cpu core.

If all queries access just one partition/date then there is no point in going parallel. If you have DAS and have queries that span more than one partition then the best setup is to have one or more disk arrays per slave process (thus no contention) with approximately 2 slaves processes per cpu. (one is crunching while the other is reading). But be sure to have enough RAM per process so that there is no swapping, e.g. a machine with 4cpu/32GBram/4controllers(2channels each)/8diskarray(14*146GB drives each) can have 8 slave processes on 50 million row (per partition) tables and 4 slave processes on 100 million row (per partition) tables. The o/s independent way to allocate partitions to different disk-arrays is to use a file (par.txt) that lists the different directories, e.g.

/data/db/ sym mas par.txt

where par.txt is

/0/db /1/db /2/db /3/db

and these directories are:

/0/db 2005.03.15/trade quote 2005.03.19/trade quote /1/db 2005.03.16/trade quote 2005.03.20/trade quote ..

Partitions are round-robin allocated mod N where N is the number of partitions. Round-robin is to get maximum parallelization on date ranges.

25 Logs

Log/commit strategies are none, file (-l) or synch (-L). None is good for test, readonly, readmostly, trusted, duplicated or cache db's. File (100,000 transactions per second) is useful if you trust (or duplicate) the machine. Synch is 100 times per second with SCSI disk and 10,000 times per second with SSD(solid state disk). In any case a single drive can commit more than 40MB of updates per second. A logging database is [script/][state/]log, e.g.

>q d -l / loads d.q(func), d.qdb(data), d.log(updates) \l / checkpoint d.qdb and empty d.log (d must not have "." in it)

i.e. put code in d.q in directory .u(utilities). all update messages are logged. To send a message from the console use 0, e.g. 0"v+:1" . An error after a partial update will cause rollback. d can start empty. d, d.q and d.log should all be in the same launch directory. In general only log in production. Keep the database as simple as possible.

26 Load

(types;widths)0:file / text load (types;widths)1:file / data load

The windows version comes with an odbc loader, e.g.

>q w32/odbc.k q).odbc.load`northwind.mdb / will load entire database

You should get 10MB/sec -- if the source rdbms and network are up to it. Also,` t:.odbc.tables h r:.odbc.eval[h]"..." .odbc.close h

$bcp t out t.txt -c

27 Archive

It is trivial to incrementally archive the large parallel databases: simply move the old directories away and send the reset message.

28 User

Servers can control access with a file of usr:pwd's, e.g.

>q f -p 1234 -u ../user.txt

Clients connect with:

q> c:hopen`:host:1234:usr:pwd> c c=new k.c("host",1234,"usr:pwd"); c.cs> c c=new k.c("host",1234,"usr:pwd"); c.o> int c=khpu("host",1234,"usr:pwd");

Even when there is no server -u the q client still sends username. All incoming messages reflect:

.z.w / who-socket .z.a / address .z.u / user

Clients can only do file i/o in the server directory.

29 Grid

Kdb+ is a parallel/grid programming system.

>q [f] -s n / will start n slaves threads {..}peach x / parallel execute on x, e.g.

The overhead is conditional threads - 1 microsecond - so use accordingly. Test overhead with:

\t {x}peach til 1000

Ideal parallel cpu candidate has high cpu%data, e.g.

{sum exp x?1.0}peach 2#1000000 / should be twice as fast with 2 cpu's

Peach is also good for going after many different drives at once even if you only have one cpu.