q vs python: Tables joins

(Series) A side-by-side comparison of q/kdb+ and python for qbies (q newbies).

Cover image

This cheat sheet is a list of code snippets that do the same thing in q and python -- something I wished existed when I first came to q/KDB+ from python. Note that in both languages, the code shown is sometimes not the most elegant or succinct; that's on purpose because I wanted to show the same constructs in both languages. For a comprehensive intro to q, check out Q for Mortals.

Note q code snippets are edited to have line breaks for readability (my opinion; ymmv). Multiline statements do not work at the prompt, but they do work in scripts. See docs.

Versions: q (3.6), python (3.7)

Init*

q)t1:([]k1:`a`b`c; v1:til 3)

q)t2:([]k2:`a`X`c; v2:3+til 3)

q)t1
k1 v1
-----
a  0
b  1
c  2

q)t2
k2 v2
-----
a  3
X  4
c  5
>>> df = pd.DataFrame(dict(k1=list('abc'), v1=range(3)))

>>> dg = pd.DataFrame(dict(k2=list('aXc'), v2=range(3, 6)))

>>> df
  k1  v1
--------
0  a   0
1  b   1
2  c   2

>>> dg
  k2  v2
--------
0  a   3
1  X   4
2  c   5

Join and Concatenate*

// left-join
q)t1 lj `k1 xkey `k1 xcol t2
k1 v1 v2
--------
a  0  3
b  1
c  2  5

// concat side-by-side
q)t1 ,' t2
k1 v1 k2 v2
-----------
a  0  a  3
b  1  X  4
c  2  c  5

// concat one-below-the-other
q)t1, `k1`v1 xcol t2

k1 v1
-----
a  0
b  1
c  2
a  3
X  4
c  5

// concat side-by-side with dup col names
// (second table wins)
q)t1 ,' `k1`v1 xcol t2
k1 v1
-----
a  3
X  4
c  5
# left-join
>>> df.set_index('k1').join(dg.set_index('k2')).reset_index()
  k1  v1   v2
-------------
0  a   0  3.0
1  b   1  NaN
2  c   2  5.0

# concat side-by-side
>>> pd.concat([df, dg], axis=1)
  k1  v1 k2  v2
---------------
0  a   0  a   3
1  b   1  X   4
2  c   2  c   5

# concat one-below-the-other
>>> pd.concat([df, dg.rename(columns=dict(k2="k1", v2="v1"))],
...           axis=0, ignore_index=True)
  k1  v1
--------
0  a   0
1  b   1
2  c   2
3  a   3
4  X   4
5  c   5

# concat side-by-side with dup col names
# (output has duplicated col names; don't do this)
>>> pd.concat([df, dg.rename(columns=dict(k2="k1", v2="v1"))], axis=1)
  k1  v1 k1  v1
---------------
0  a   0  a   3
1  b   1  X   4
2  c   2  c   5