(Series) A side-by-side comparison of q/kdb+ and python for qbies (q newbies).
This cheat sheet is a list of code snippets that do the same thing in
q and python -- something I wished existed when I first came to q/KDB+ from python.
Note that in both languages, the code shown is sometimes not the most elegant or
succinct; that's on purpose because I wanted to show the same constructs in both languages.
For a comprehensive intro to q, check out Q for Mortals.
Note q code snippets are edited to have line breaks for readability (my
opinion; ymmv). Multiline statements do not work at the prompt, but they do work in scripts. See docs.
Versions: q (3.6), python (3.7)
Init
q)t:([]
ints:100*1+til 3;
syms:`one`two`san;
flts:0.1*til 3
)
q)t
ints syms flts
--------------
100 one 0
200 two 0.1
300 san 0.2
|
>>> df = pd.DataFrame(dict(
... ints=100 * (1 + np.arange(3)),
... strs=['one', 'two', 'san'],
... flts=0.1 * np.arange(3),
... ))
>>> df
ints strs flts
------------------
0 100 one 0.0
1 200 two 0.1
2 300 san 0.2
|
Select columns
q)select syms, ints from t
syms ints
---------
one 100
two 200
san 300
q)`syms`ints#t
syms ints
---------
one 100
two 200
san 300
q)(enlist `ints)#t
ints
----
100
200
300
|
>>> df.loc[:, ['ints', 'strs']]
strs ints
------------
0 one 100
1 two 200
2 san 300
>>> df[['strs', 'ints']]
strs ints
------------
0 one 100
1 two 200
2 san 300
>>> df[['ints']]
ints
-------
0 100
1 200
2 300
|
df['ints'] will return a pd.Series instead of a pd.DataFrame.
Select columns programmatically
q)c:`flts`syms
q)?[t;();0b;c!c]
flts syms
---------
0 one
0.1 two
0.2 san
|
>>> c = ['flts', 'strs']
>>> df[c]
flts strs
------------
0 0.0 one
1 0.1 two
2 0.2 san
|
Table to list(s)
q)exec flts from t
0 0.1 0.2
q)t`syms
`one`two`san
q)t`ints`syms
100 200 300
`one`two`san
q)value flip t
100 200 300
one two san
0 0.1 0.2
|
>>> df['flts'].values
array([0. , 0.1, 0.2])
>>> df.strs
array(['one', 'two', 'san'], dtype=object)
>>> [v.values for v in df[['ints', 'strs']].to_dict('series').values()]
[array([100, 200, 300]),
array(['one', 'two', 'san'], dtype=object)]
>>> df.values.tranpose()
array([[100, 200, 300],
['one', 'two', 'san'],
[0.0, 0.1, 0.2]], dtype=object)
|
Gotchas: missing column
Be careful when selecting columns like you would from a dictionary (which is what tables are).
Similar to dictionaries, missing
columns will return an empty list of the first column's type.
q)exec notHere from t
'notHere
[0] exec notHere from t
^
q)?[t;();0b;c!c:`ints`notHere]
'notHere
[0] ?[t;();0b;c!c:`ints`notHere]
^
q)t`notHere
`long$()
q)([]bool:010b)`notHere
`boolean$()
|
>>> df['not_here']
KeyError: 'not_here'
>>> df.not_here
AttributeError: 'DataFrame' object has no attribute 'not_here'
|
Table to dictionary of lists
If you're doing this in python, you're probably doing something wrong.
q)flip t
ints| 100 200 300
syms| one two san
flts| 0 0.1 0.2
q)exec ints,syms from t
ints| 100 200 300
syms| one two san
q)`ints`syms#flip t
ints| 100 200 300
syms| one two san
|
>>> dict(zip(df.columns, df.values.T.tolist()))
{'ints': [100, 200, 300],
'strs': ['one', 'two', 'san'],
'flts': [0.0, 0.1, 0.2]}
>>> dt = df[['ints', 'strs']] # temp dataframe
>>> dict(zip(dt.columns, dt.values.T.tolist()))
{'ints': [100, 200, 300],
'strs': ['one', 'two', 'san']}
|
Table to dictionary of one column to another
Make sure your key column is unique:
q)(t`ints)!(t`syms)
100| one
200| two
300| san
|
>>> df.set_index('ints').strs.to_dict()
{100: 'one', 200: 'two', 300: 'san'}
|
... or else:
q)u:([] k:`a`a`b; v:`one`two`san)
q)d:(t`k)!(t`v)
q)d
a| one
a| two // `two is not accessible with: d[`a]
b| san
|
>>> du = pd.DataFrame(dict(k=list("aab"), v="one two san".split()))
>>> d = du.set_index("k").v.to_dict()
>>> d
{'a': 'two', 'b': 'san'} # we lost 'one'
|
Append to a table
// Prepare a dict, a one-row table, and a list
q)d:(`ints`syms`flts)!(777;`svn;7.7)
q)d
ints| 777
syms| `svn
flts| 7.7
q)r:enlist d
q)r
ints syms flts
--------------
777 svn 7.7
q)l:value d
q)l
777
`svn
7.7
// Start appending
q)t,d // join with a dict
q)t,r // join with a 1-row table
ints syms flts
--------------
100 one 0
200 two 0.1
300 san 0.2
777 svn 7.7
q)t,:l // join-in-place with a list
q)t
ints syms flts
--------------
100 one 0
200 two 0.1
300 san 0.2
777 svn 7.7
|
# Prepare a dict, a one-row table, and a list
>>> d = dict(ints=777, strs='svn', flts=7.7)
>>> d
{'ints': 777, 'strs': 'svn', 'flts': 7.7}
>>> r = pd.DataFrame.from_records([d])
>>> r
ints strs flts
------------------
0 777 svn 7.7
>>> l = list(d.values())
>>> l
[777, 'svn', 7.7]
# Start appending
>>> df.append(d, ignore_index=True)
>>> df.append(r, ignore_index=True)
ints strs flts
------------------
0 100 one 0.0
1 200 two 0.1
2 300 san 0.2
3 777 svn 7.7
>>> df.loc[len(df)] = l
>>> df
ints strs flts
------------------
0 100 one 0.0
1 200 two 0.1
2 300 san 0.2
3 777 svn 7.7
|
Select rows
q)select from t where i in 0 2
ints syms flts
--------------
100 one 0
300 san 0.2
q)?[t;enlist (<;`ints;300);0b;()]
ints syms flts
--------------
100 one 0
200 two 0.1
q)?[t;((<;`ints;300);(>;`flts;0));0b;()]
ints syms flts
--------------
200 two 0.1
|
>>> df.loc[[0, 2]]
ints strs flts
------------------
0 100 one 0.0
2 300 san 0.2
>>> df.loc[df.ints < 300]
ints strs flts
------------------
0 100 one 0.0
1 200 two 0.1
>>> df.query("ints < 300 and 0 < flts")
------------------
ints strs flts
1 200 two 0.1
|
Add new columns
q)t:update a:ints*flts, b:777 from t
q)t
ints syms flts a b
---------------------
100 one 0 0 777
200 two 0.1 20 777
300 san 0.2 60 777
|
>>> df['a'] = df.ints * df.flts
>>> df['b'] = 777
>>> df
ints strs flts a b
-----------------------------
0 100 one 0.0 0.0 777
1 200 two 0.1 20.0 777
2 300 san 0.2 60.0 777
|
Select rows of a keyed table
q)t:([k:`a1`a2`a3]
v:til 3;
w:`m`n`o)
q)t
k | v w
--| ---
a1| 0 m
a2| 1 n
a3| 2 o
q)t`a1 # returns a dictionary
v| 0
w| `m
q)t ([]k:`a1`a3)
v w
---
0 m
2 o
|
>>> df = pd.DataFrame(dict(v=[0, 1, 2], w=list("mno"),
k="a1 a2 a3".split()))
>>> df = df.set_index('k')
>>> df
v w
k
a1 0 m
a2 1 n
a3 2 o
>>> df.loc['a1'] # returns a pd.Series
v 0
w m
Name: a1, dtype: object
>>> df.loc[['a1', 'a2']]
v w
k
a1 0 m
a2 1 n
|
Add columns using functions
On the q side, the even and total functions are called with vectors (which you
can check by including a show statement like even:{show x;0=x
mod 2}, while the strJoin is called with "each" ' (i.e. one
row at a time).
This is the same on the python side. Both even and total perform vectorized numpy operations.
q)even:{0=x mod 2}
q)strJoin:{(string x),string y}
q)total:{sum(x; y; z)}
q)t:update
e:even ints%100,
s:strJoin'[ints;syms],
t:total[ints;flts;flts]
from t
q)t
ints syms flts e s t
-------------------------------
100 one 0 0 "100one" 100
200 two 0.1 1 "200two" 200.2
300 san 0.2 0 "300san" 300.4
|
>>> from itertools import starmap
>>> def even(x):
return x % 2 == 0
>>> def join_as_str(a, b):
return str(a) + str(b)
>>> def total(x, y, z):
return np.sum((x, y, z), axis=0)
>>> df['e'] = even(df.ints)
>>> df['s'] = starmap(join_as_str, zip(df.ints, df.strs))
>>> df['t'] = total(df.ints, df.flts, df.flts)
>>> df
ints strs flts e s t
----------------------------------------
0 100 one 0.0 False 100one 100.0
1 200 two 0.1 True 200two 200.2
2 300 san 0.2 False 300san 300.4
|
Inspect column types
Reference on meta (link) and data
types (link).
q)t
ints syms flts e s t
-------------------------------
100 one 0 0 "100one" 100
200 two 0.1 1 "200two" 200.2
300 san 0.2 0 "300san" 300.4
q)meta t
c | t f a
----| -----
ints| j
syms| s
flts| f
e | b
s | C
t | f
|
>>> df
ints strs flts e s t
----------------------------------------
0 100 one 0.0 False 100one 100.0
1 200 two 0.1 True 200two 200.2
2 300 san 0.2 False 300san 300.4
>>> df.dtypes
ints int64
strs object
flts float64
e bool
s object
t float64
dtype: object
|
Create an empty table
|
>>> pd.DataFrame(columns=['a', 'b', 'c'])
Empty DataFrame
Columns: [a, b, c]
Index: []
|