Skip to content

Selecting columns for a list breaks multiple matches #2

@bobpaul

Description

@bobpaul

Maybe this can already be done and I'm just not getting it, but here's a contrived example to illulstrate.

Let's say I have some output of ps aux which looks like this:

$ ps aux 
message+   792  0.0  0.0  42892  3672 ?        Ss   11:33   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root       839  0.0  0.1 274488  5924 ?        Ssl  11:33   0:00 /usr/lib/accountsservice/accounts-daemon
daemon     846  0.0  0.0  26044  2064 ?        Ss   11:33   0:00 /usr/sbin/atd -f
root      1003  0.0  0.0  13376   168 ?        Ss   11:33   0:00 /sbin/mdadm --monitor --pid-file /run/mdadm/monitor.pid --daemonise --scan --syslog
bobpaul     1318  0.0  0.1  21516  5224 pts/0    Ss   11:37   0:00 -bash
bobpaul     1339  0.0  4.5 676188 183092 ?       Ssl  11:38   0:18 emacs --daemon
bobpaul     1499  0.0  0.1  21568  5504 pts/1    Ss+  11:48   0:00 -bash
bobpaul     1512  0.0  0.1  21480  5420 pts/2    Ss+  11:48   0:00 -bash
bobpaul     2635  0.0  0.0  12944   936 pts/0    R+   19:03   0:00 grep --color=auto -e daemon -e bash
bobpaul     2636  0.0  0.0  21516  2104 pts/0    D+   19:03   0:00 -bash
$ 

Now, for all lines that contain bash I want to print the 5th column. For all lines that contain daemon I want to print the 2nd column. This can be done in awk like:

$ ps aux | awk '/daemon/ { print $2 } /bash/ { print $5 }'
792
839
846
1003
21516
1339
21568
21480
2635
12944
21516
$ 

So I try it to incrementally build the command with pyp... I start by matching both conditions, which after a bit of messing around, I figured out I could do with 'or'. (Maybe this is already abusive.)

$ ps aux | pyp "p.re('.*daemon.*').split() or p.re('.*bash.*').split()"
[[0]message+[1]792[2]0.0[3]0.0[4]42892[5]3672[6]?[7]Ss[8]11:33[9]0:00[10]/usr/bin/dbus-daemon[11]--system[12]--address=systemd:[13]--nofork[14]--nopidfile[15]--systemd-activation]
[[0]root[1]839[2]0.0[3]0.1[4]274488[5]5924[6]?[7]Ssl[8]11:33[9]0:00[10]/usr/lib/accountsservice/accounts-daemon]
[[0]daemon[1]846[2]0.0[3]0.0[4]26044[5]2064[6]?[7]Ss[8]11:33[9]0:00[10]/usr/sbin/atd[11]-f]
[[0]root[1]1003[2]0.0[3]0.0[4]13376[5]168[6]?[7]Ss[8]11:33[9]0:00[10]/sbin/mdadm[11]--monitor[12]--pid-file[13]/run/mdadm/monitor.pid[14]--daemonise[15]--scan[16]--syslog]
[[0]bobpaul[1]1318[2]0.0[3]0.1[4]21516[5]5224[6]pts/0[7]Ss[8]11:37[9]0:00[10]-bash]
[[0]bobpaul[1]1339[2]0.0[3]4.5[4]676188[5]183092[6]?[7]Ssl[8]11:38[9]0:18[10]emacs[11]--daemon]
[[0]bobpaul[1]1499[2]0.0[3]0.1[4]21568[5]5504[6]pts/1[7]Ss+[8]11:48[9]0:00[10]-bash]
[[0]bobpaul[1]1512[2]0.0[3]0.1[4]21480[5]5420[6]pts/2[7]Ss+[8]11:48[9]0:00[10]-bash]
[[0]bobpaul[1]2635[2]0.0[3]0.0[4]12944[5]936[6]pts/0[7]R+[8]19:03[9]0:00[10]grep[11]--color=auto[12]-e[13]daemon[14]-e[15]bash]
[[0]bobpaul[1]2636[2]0.0[3]0.0[4]21516[5]2104[6]pts/0[7]D+[8]19:03[9]0:00[10]-bash]
$

Good so far. And grab the columns (remember awk is 1 indexed, python is 0):

$ ps aux | pyp "p.re('.*daemon.*').split()[1] or p.re('.*bash.*').split()[4]"
792
839
846
1003
1339
2635
$ 

Wait, that's not enough results. It's only shows the columns for daemon matches. I think what's happening is the [1] selector must cause the first part to evaluate to True in cases where the regex didn't match (returned None). (None[1] would cause an exception, so part of the exception handling routine must make it always return True).

This becomes apparent if we remove the column selector from the daemon regex:

$ ps | pyp "p.re('.*daemon.*').split() or p.re('.*bash.*').split()[4]"
[[0]message+[1]792[2]0.0[3]0.0[4]42892[5]3672[6]?[7]Ss[8]11:33[9]0:00[10]/usr/bin/dbus-daemon[11]--system[12]--address=systemd:[13]--nofork[14]--nopidfile[15]--systemd-activation]
[[0]root[1]839[2]0.0[3]0.1[4]274488[5]5924[6]?[7]Ssl[8]11:33[9]0:00[10]/usr/lib/accountsservice/accounts-daemon]
[[0]daemon[1]846[2]0.0[3]0.0[4]26044[5]2064[6]?[7]Ss[8]11:33[9]0:00[10]/usr/sbin/atd[11]-f]
[[0]root[1]1003[2]0.0[3]0.0[4]13376[5]168[6]?[7]Ss[8]11:33[9]0:00[10]/sbin/mdadm[11]--monitor[12]--pid-file[13]/run/mdadm/monitor.pid[14]--daemonise[15]--scan[16]--syslog]
21516
[[0]bobpaul[1]1339[2]0.0[3]4.5[4]676188[5]183092[6]?[7]Ssl[8]11:38[9]0:18[10]emacs[11]--daemon]
21568
21480
[[0]bobpaul[1]2635[2]0.0[3]0.0[4]12944[5]936[6]pts/0[7]R+[8]19:03[9]0:00[10]grep[11]--color=auto[12]-e[13]daemon[14]-e[15]bash]
21516
$ 

Now it's returning both matches again, but only selecting columns on the second match.

Am I just approaching this problem the wrong way, or is it not currently possible to replicate the awk code that outputs a different column depending on what within the line matched?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions