q vs python: Tables

Sep 2020

(Series) A side-by-side comparison of q/kdb+ and python for qbies (q newbies).

This cheat sheet is a list of code snippets that do the same thing in q and python -- something I wished existed when I first came to q/KDB+ from python. Note that in both languages, the code shown is sometimes not the most elegant or succinct; that's on purpose because I wanted to show the same constructs in both languages. For a comprehensive intro to q, check out Q for Mortals.

Note q code snippets are edited to have line breaks for readability (my opinion; ymmv). Multiline statements do not work at the prompt, but they do work in scripts. See docs.

Versions: q (3.6), python (3.7)

Init*

q)t:([]
    ints:100*1+til 3;
    syms:`one`two`san;
    flts:0.1*til 3
    )

q)t
ints syms flts
--------------
100  one  0
200  two  0.1
300  san  0.2

>>> df = pd.DataFrame(dict(
...     ints=100 * (1 + np.arange(3)),
...     strs=['one', 'two', 'san'],
...     flts=0.1 * np.arange(3),
...     ))

>>> df
   ints strs  flts
------------------
0   100  one   0.0
1   200  two   0.1
2   300  san   0.2

Select columns*

q)select syms, ints from t
syms ints
---------
one  100
two  200
san  300

q)`syms`ints#t
syms ints
---------
one  100
two  200
san  300

q)(enlist `ints)#t
ints
----
100
200
300

>>> df.loc[:, ['ints', 'strs']]
  strs  ints
------------
0  one   100
1  two   200
2  san   300

>>> df[['strs', 'ints']]
  strs  ints
------------
0  one   100
1  two   200
2  san   300

>>> df[['ints']]
   ints
-------
0   100
1   200
2   300

df['ints'] will return a pd.Series instead of a pd.DataFrame.

Select columns programmatically*

q)c:`flts`syms
q)?[t;();0b;c!c]
flts syms
---------
0    one
0.1  two
0.2  san

>>> c = ['flts', 'strs']
>>> df[c]
   flts strs
------------
0   0.0  one
1   0.1  two
2   0.2  san

Table to list(s)*

q)exec flts from t
0 0.1 0.2

q)t`syms
`one`two`san

q)t`ints`syms
100 200 300
`one`two`san

q)value flip t
100 200 300
one two san
0   0.1 0.2

>>> df['flts'].values
array([0. , 0.1, 0.2])

>>> df.strs
array(['one', 'two', 'san'], dtype=object)

>>> [v.values for v in df[['ints', 'strs']].to_dict('series').values()]
[array([100, 200, 300]),
 array(['one', 'two', 'san'], dtype=object)]

>>> df.values.tranpose()
array([[100, 200, 300],
       ['one', 'two', 'san'],
       [0.0, 0.1, 0.2]], dtype=object)

Gotchas: missing column*

Be careful when selecting columns like you would from a dictionary (which is what tables are). Similar to dictionaries, missing columns will return an empty list of the first column's type.

q)exec notHere from t
'notHere
  [0]  exec notHere from t
       ^

q)?[t;();0b;c!c:`ints`notHere]
'notHere
  [0]  ?[t;();0b;c!c:`ints`notHere]
       ^

q)t`notHere
`long$()

q)([]bool:010b)`notHere
`boolean$()

>>> df['not_here']
KeyError: 'not_here'








>>> df.not_here
AttributeError: 'DataFrame' object has no attribute 'not_here'

Table to dictionary of lists*

If you're doing this in python, you're probably doing something wrong.

q)flip t
ints| 100 200 300
syms| one two san
flts| 0   0.1 0.2

q)exec ints,syms from t
ints| 100 200 300
syms| one two san

q)`ints`syms#flip t
ints| 100 200 300
syms| one two san

>>> dict(zip(df.columns, df.values.T.tolist()))
{'ints': [100, 200, 300],
 'strs': ['one', 'two', 'san'],
 'flts': [0.0, 0.1, 0.2]}

>>> dt = df[['ints', 'strs']]  # temp dataframe
>>> dict(zip(dt.columns, dt.values.T.tolist()))
{'ints': [100, 200, 300],
 'strs': ['one', 'two', 'san']}

Table to dictionary of one column to another*

Make sure your key column is unique:

q)(t`ints)!(t`syms)
100| one
200| two
300| san

>>> df.set_index('ints').strs.to_dict()
{100: 'one', 200: 'two', 300: 'san'}

... or else:

q)u:([] k:`a`a`b; v:`one`two`san)

q)d:(t`k)!(t`v)

q)d
a| one
a| two  // `two is not accessible with: d[`a]
b| san

>>> du = pd.DataFrame(dict(k=list("aab"), v="one two san".split()))

>>> d = du.set_index("k").v.to_dict()

>>> d
{'a': 'two', 'b': 'san'}  # we lost 'one'

Append to a table*

// Prepare a dict, a one-row table, and a list

q)d:(`ints`syms`flts)!(777;`svn;7.7)
q)d
ints| 777
syms| `svn
flts| 7.7

q)r:enlist d
q)r
ints syms flts
--------------
777  svn  7.7

q)l:value d
q)l
777
`svn
7.7

// Start appending

q)t,d          // join with a dict
q)t,r          // join with a 1-row table
ints syms flts
--------------
100  one  0
200  two  0.1
300  san  0.2
777  svn  7.7

q)t,:l         // join-in-place with a list
q)t
ints syms flts
--------------
100  one  0
200  two  0.1
300  san  0.2
777  svn  7.7

# Prepare a dict, a one-row table, and a list

>>> d = dict(ints=777, strs='svn', flts=7.7)
>>> d
{'ints': 777, 'strs': 'svn', 'flts': 7.7}



>>> r = pd.DataFrame.from_records([d])
>>> r
   ints strs  flts
------------------
0   777  svn   7.7

>>> l = list(d.values())
>>> l
[777, 'svn', 7.7]



# Start appending

>>> df.append(d, ignore_index=True)
>>> df.append(r, ignore_index=True)
   ints strs  flts
------------------
0   100  one   0.0
1   200  two   0.1
2   300  san   0.2
3   777  svn   7.7

>>> df.loc[len(df)] = l
>>> df
   ints strs  flts
------------------
0   100  one   0.0
1   200  two   0.1
2   300  san   0.2
3   777  svn   7.7

Select rows*

q)select from t where i in 0 2
ints syms flts
--------------
100  one  0
300  san  0.2

q)?[t;enlist (<;`ints;300);0b;()]
ints syms flts
--------------
100  one  0
200  two  0.1

q)?[t;((<;`ints;300);(>;`flts;0));0b;()]
ints syms flts
--------------
200  two  0.1

>>> df.loc[[0, 2]]
   ints strs  flts
------------------
0   100  one   0.0
2   300  san   0.2

>>> df.loc[df.ints < 300]
   ints strs  flts
------------------
0   100  one   0.0
1   200  two   0.1

 >>> df.query("ints < 300 and 0 < flts")
------------------
   ints strs  flts
1   200  two   0.1

Add new columns*

q)t:update a:ints*flts, b:777 from t

q)t
ints syms flts a  b
---------------------
100  one  0    0  777
200  two  0.1  20 777
300  san  0.2  60 777

>>> df['a'] = df.ints * df.flts
>>> df['b'] = 777
>>> df
   ints strs  flts     a    b
-----------------------------
0   100  one   0.0   0.0  777
1   200  two   0.1  20.0  777
2   300  san   0.2  60.0  777

Select rows of a keyed table*

q)t:([k:`a1`a2`a3]
      v:til 3;
      w:`m`n`o)
q)t
k | v w
--| ---
a1| 0 m
a2| 1 n
a3| 2 o

q)t`a1  # returns a dictionary
v| 0
w| `m


q)t ([]k:`a1`a3)
v w
---
0 m
2 o

>>> df = pd.DataFrame(dict(v=[0, 1, 2], w=list("mno"),
                           k="a1 a2 a3".split()))
>>> df = df.set_index('k')
>>> df
    v  w
k
a1  0  m
a2  1  n
a3  2  o

>>> df.loc['a1']  # returns a pd.Series
v    0
w    m
Name: a1, dtype: object

>>> df.loc[['a1', 'a2']]
    v  w
k
a1  0  m
a2  1  n

Add columns using functions*

On the q side, the even and total functions are called with vectors (which you can check by including a show statement like even:{show x;0=x mod 2}, while the strJoin is called with "each" ' (i.e. one row at a time).

This is the same on the python side. Both even and total perform vectorized numpy operations.

q)even:{0=x mod 2}

q)strJoin:{(string x),string y}

q)total:{sum(x; y; z)}

q)t:update
    e:even ints%100,
    s:strJoin'[ints;syms],
    t:total[ints;flts;flts]
    from t

q)t
ints syms flts e s        t
-------------------------------
100  one  0    0 "100one" 100
200  two  0.1  1 "200two" 200.2
300  san  0.2  0 "300san" 300.4

>>> from itertools import starmap
>>> def even(x):
        return x % 2 == 0
>>> def join_as_str(a, b):
        return str(a) + str(b)
>>> def total(x, y, z):
        return np.sum((x, y, z), axis=0)

>>> df['e'] = even(df.ints)
>>> df['s'] = starmap(join_as_str, zip(df.ints, df.strs))
>>> df['t'] = total(df.ints, df.flts, df.flts)

>>> df
   ints strs  flts      e       s      t
----------------------------------------
0   100  one   0.0  False  100one  100.0
1   200  two   0.1   True  200two  200.2
2   300  san   0.2  False  300san  300.4

Inspect column types*

Reference on meta (link) and data types (link).

q)t
ints syms flts e s        t
-------------------------------
100  one  0    0 "100one" 100
200  two  0.1  1 "200two" 200.2
300  san  0.2  0 "300san" 300.4

q)meta t
c   | t f a
----| -----
ints| j
syms| s
flts| f
e   | b
s   | C
t   | f

>>> df
   ints strs  flts      e       s      t
----------------------------------------
0   100  one   0.0  False  100one  100.0
1   200  two   0.1   True  200two  200.2
2   300  san   0.2  False  300san  300.4

>>> df.dtypes
ints      int64
strs     object
flts    float64
e          bool
s        object
t       float64
dtype: object

Create an empty table*

q)`a`b`c!()
a b c
-----

>>> pd.DataFrame(columns=['a', 'b', 'c'])
Empty DataFrame
Columns: [a, b, c]
Index: []