Suggestions for some of the Pandas solutions

Hi @Atrebas , great article comparing the two data processing tools. The concise syntax of Rdatatable is appealing - R is not my strong suit, so hopefully my time with #pydatatable will be fun. The code below are excerpts of some parts of your Pandas code I believe could be better written, and made more performant. For some scenarios, Python's verbose syntax is unavoidable. I put comments for each of them to refer to the particular part of the code. For some, I have no idea (`rleid`, `%inrange%`,...). When I have some time again, I will have a look at it, check some codes online (the rdatatable explanation wasnt too clear for me) and then I may revisit it. Either ways, thanks for your article. Was a good read. Same for the dplyr comparision as well. Cheers.

```python
import pandas as pd
import numpy as np
```


```python
pd.__version__
```




    '1.1.0'




```python
df = pd.DataFrame(
  {"V1" : [1, 2, 1, 2, 1, 2, 1, 2, 1],
   "V2" : [1, 2, 3, 4, 5, 6, 7, 8, 9], 
   "V3" : [0.5, 1.0, 1.5] * 3, 
   "V4" : ['A', 'B', 'C'] * 3}) 
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V3</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>1</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>2</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>3</td>
      <td>1.5</td>
      <td>C</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2</td>
      <td>4</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2</td>
      <td>6</td>
      <td>1.5</td>
      <td>C</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2</td>
      <td>8</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>1.5</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Discard rows using negative indices
df.loc[~df.index.isin(range(2,7))] #no need for the list constructor
#or df.query("not index.between(2,6)", engine='python')
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V3</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>1</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>2</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2</td>
      <td>8</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>1.5</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Filter rows using multiple conditions
df.loc[(df.V1==1) & (df.V4=="A")]
# or df.query("V1==1 and V4=='A' ")
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V3</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>1</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>0.5</td>
      <td>A</td>
    </tr>
  </tbody>
</table>
</div>




```python
#other filters
#tuples used when selecting multiple starts (("B","C",...))
df.loc[df.V4.str.startswith("B")]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V3</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>2</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2</td>
      <td>8</td>
      <td>1.0</td>
      <td>B</td>
    </tr>
  </tbody>
</table>
</div>




```python
#DT[V2 %inrange% list(-1:1, 1:3)]
#dont understand this yet, will look into it
```


```python
#Select one column using an index (not recommended)
df.iloc[:,[2]] #DF.iloc[:, 2].to_frame() unneccessary
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.5</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1.0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>1.5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>0.5</td>
    </tr>
    <tr>
      <th>7</th>
      <td>1.0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1.5</td>
    </tr>
  </tbody>
</table>
</div>




```python
cols = ['V2', 'V3']
df.loc[:, cols]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V2</th>
      <th>V3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>0.5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>1.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
      <td>1.5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>0.5</td>
    </tr>
    <tr>
      <th>4</th>
      <td>5</td>
      <td>1.0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>6</td>
      <td>1.5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>7</td>
      <td>0.5</td>
    </tr>
    <tr>
      <th>7</th>
      <td>8</td>
      <td>1.0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>9</td>
      <td>1.5</td>
    </tr>
  </tbody>
</table>
</div>




```python
#summarise one column
df.loc[:, ["V1"]].agg(['sum']) # unnecessary : DF[['V1']].sum().to_frame(name = 'sumV1')
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>sum</th>
      <td>13</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Summarise several columns
#Pandas version 1.1.0 supports renaming using namedtuples
#previously available only for aggregations after groupby
#not as pretty as rdatatable
#Pandas places the aggregations as indices
df.agg(sumV1=("V1",'sum'), sdv3=("V3","std"))
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>sumV1</th>
      <td>13.0</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>sdv3</th>
      <td>NaN</td>
      <td>0.433013</td>
    </tr>
  </tbody>
</table>
</div>




```python
##Summarise a subset of rows
df.loc[:3,"V1"].sum() #0 is redundant
```




    6




```python
df.loc[0,'V3'] #df.head(1).V3 not necessary and inefficient
```




    0.5




```python
df.at[df.index[-1], 'V3'] #df.tail(1).V3 not necessary and inefficient
#at method useful and a bit more performant than loc if interested in only scalars
```




    1.5




```python
#trying to keep up with changes from here
df.loc[:, "V1"] = df.loc[:, "V1"] ** 2
```


```python
df = df.assign(v5 = np.log(df.V1))
```


```python
df = df.assign(v6 = np.sqrt(df.V1), v7 = 'X')
```


```python
#Create one column and remove the others
#no need to create another dataframe
df.loc[:, ["V3"]].add(1).rename(columns={"V3":"V8"})
# or df.loc[:, ["V3"]].add(1).set_axis(["V8"], axis = 1)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V8</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2.5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1.5</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2.0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2.5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1.5</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2.0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2.5</td>
    </tr>
  </tbody>
</table>
</div>




```python
del df["v5"]
```


```python
df = df.drop(["v6", "v7"],1)
```


```python
cols = 'V3'
del df[cols]
```


```python
#Replace values for rows matching a condition
df.loc[df.loc[:, "V2"] < 4, "V2"] = 0
```


```python
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>6</td>
      <td>C</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>7</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#by group
df.groupby(["V4"],as_index=False).agg(sumV2=("V2","sum")) #no need for the to frame and reset index 
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>sumV2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>11</td>
    </tr>
    <tr>
      <th>1</th>
      <td>B</td>
      <td>13</td>
    </tr>
    <tr>
      <th>2</th>
      <td>C</td>
      <td>15</td>
    </tr>
  </tbody>
</table>
</div>




```python
#several groups
df.groupby(["V4","V1"], as_index = False).agg(sumV2=("V2","sum")) # again, no need for the to frame and reset index 
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>V1</th>
      <th>sumV2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>1</td>
      <td>7</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>4</td>
      <td>4</td>
    </tr>
    <tr>
      <th>2</th>
      <td>B</td>
      <td>1</td>
      <td>5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>B</td>
      <td>4</td>
      <td>8</td>
    </tr>
    <tr>
      <th>4</th>
      <td>C</td>
      <td>1</td>
      <td>9</td>
    </tr>
    <tr>
      <th>5</th>
      <td>C</td>
      <td>4</td>
      <td>6</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Calling function in by

#you could apply the function before grouping
df.assign(V4 = df.V4.str.lower()).groupby(["V4"], as_index=False).agg(sumV1=("V1","sum"))

# or df.groupby(df.V4.str.lower()).agg(sumV1=("V1","sum")).reset_index()
# reset_index becomes inevitable here
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>sumV1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>a</td>
      <td>6</td>
    </tr>
    <tr>
      <th>1</th>
      <td>b</td>
      <td>9</td>
    </tr>
    <tr>
      <th>2</th>
      <td>c</td>
      <td>6</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Assigning column name in by
#not possible in Pandas
#create the name before grouping
df.assign(abc = lambda x: x.V4.str.lower()).groupby(["abc"], as_index=False).agg(sumV1=("V1","sum"))
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>abc</th>
      <th>sumV1</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>a</td>
      <td>6</td>
    </tr>
    <tr>
      <th>1</th>
      <td>b</td>
      <td>9</td>
    </tr>
    <tr>
      <th>2</th>
      <td>c</td>
      <td>6</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Using a condition in by
#just trying to keep track here
df.groupby(df.V4=="A").V1.sum()
```




    V4
    False    15
    True      6
    Name: V1, dtype: int64




```python
#on a subset of rows
df.iloc[:5].groupby("V4").agg(sumV1=("V1",'sum'))
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sumV1</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>5</td>
    </tr>
    <tr>
      <th>B</th>
      <td>5</td>
    </tr>
    <tr>
      <th>C</th>
      <td>1</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Add a column with number of observations for each group
df.assign(n = lambda x: x.groupby("V1").V4.transform("count"))
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
      <th>n</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>0</td>
      <td>A</td>
      <td>5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
      <td>4</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
      <td>5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>4</td>
      <td>A</td>
      <td>4</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
      <td>5</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>6</td>
      <td>C</td>
      <td>4</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>A</td>
      <td>5</td>
    </tr>
    <tr>
      <th>7</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
      <td>4</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
      <td>5</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Summarise all the columns
#better to use built-in functions over apply where possible
#as they are optimised/vectorized
#apply is a for loop, handy in some situations, but relatively slow in a lot of cases
df.agg(["max"])
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>max</th>
      <td>4</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Summarise several columns
df.loc[:, ["V1","V2"]].agg(["mean"]) #again, no need for apply
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>mean</th>
      <td>2.333333</td>
      <td>4.333333</td>
    </tr>
  </tbody>
</table>
</div>




```python
# Summarise several columns by group
# unpacking a dictionary comprehension suffices here
df.groupby("V4").agg(**{f"{col}_mean":(col, "mean") for col in ["V1","V2"]})
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1_mean</th>
      <th>V2_mean</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>2</td>
      <td>3.666667</td>
    </tr>
    <tr>
      <th>B</th>
      <td>3</td>
      <td>4.333333</td>
    </tr>
    <tr>
      <th>C</th>
      <td>2</td>
      <td>5.000000</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Summarise using a condition
df.select_dtypes(include="number").agg(["mean"])
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>mean</th>
      <td>2.333333</td>
      <td>4.333333</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify all the columns
#reverse on the 0 axis
#apply not needed 
df.iloc[::-1]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
    <tr>
      <th>7</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>6</td>
      <td>C</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>0</td>
      <td>A</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify several columns (dropping the others)
df.filter(["V1","V2"]).agg(np.sqrt)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.0</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2.0</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.0</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2.0</td>
      <td>2.000000</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1.0</td>
      <td>2.236068</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2.0</td>
      <td>2.449490</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1.0</td>
      <td>2.645751</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2.0</td>
      <td>2.828427</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1.0</td>
      <td>3.000000</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify several columns (dropping the others)
df.filter(regex='[^V4]').agg(np.exp)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>2.718282</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>54.598150</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>2</th>
      <td>2.718282</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>3</th>
      <td>54.598150</td>
      <td>54.598150</td>
    </tr>
    <tr>
      <th>4</th>
      <td>2.718282</td>
      <td>148.413159</td>
    </tr>
    <tr>
      <th>5</th>
      <td>54.598150</td>
      <td>403.428793</td>
    </tr>
    <tr>
      <th>6</th>
      <td>2.718282</td>
      <td>1096.633158</td>
    </tr>
    <tr>
      <th>7</th>
      <td>54.598150</td>
      <td>2980.957987</td>
    </tr>
    <tr>
      <th>8</th>
      <td>2.718282</td>
      <td>8103.083928</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify several columns (keeping the others)
df.loc[:, "V1":"V2"] = df.filter(["V1","V2"]).agg(np.sqrt)
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.0</td>
      <td>0.000000</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2.0</td>
      <td>0.000000</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.0</td>
      <td>0.000000</td>
      <td>C</td>
    </tr>
    <tr>
      <th>3</th>
      <td>2.0</td>
      <td>2.000000</td>
      <td>A</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1.0</td>
      <td>2.236068</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2.0</td>
      <td>2.449490</td>
      <td>C</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1.0</td>
      <td>2.645751</td>
      <td>A</td>
    </tr>
    <tr>
      <th>7</th>
      <td>2.0</td>
      <td>2.828427</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1.0</td>
      <td>3.000000</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
##Modify several columns (keeping the others)
cols = df.columns.difference(["V4"])
df.loc[:, cols] = df.filter(cols).agg(lambda x: pow(x,2))
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1.0</td>
      <td>0.0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4.0</td>
      <td>0.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1.0</td>
      <td>0.0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4.0</td>
      <td>4.0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1.0</td>
      <td>5.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4.0</td>
      <td>6.0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1.0</td>
      <td>7.0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>7</th>
      <td>4.0</td>
      <td>8.0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1.0</td>
      <td>9.0</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify columns using a condition (dropping the others)
#no need for intermediate cols variable
df.select_dtypes("number").sub(1)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.0</td>
      <td>-1.0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>3.0</td>
      <td>-1.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.0</td>
      <td>-1.0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>3.0</td>
      <td>3.0</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.0</td>
      <td>4.0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>3.0</td>
      <td>5.0</td>
    </tr>
    <tr>
      <th>6</th>
      <td>0.0</td>
      <td>6.0</td>
    </tr>
    <tr>
      <th>7</th>
      <td>3.0</td>
      <td>7.0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>0.0</td>
      <td>8.0</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Modify columns using a condition (keeping the others)
df.loc[:, cols] = df.loc[:, cols].astype(int)
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>5</td>
      <td>C</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>7</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.groupby("V4").head(2).assign(V2="X").sort_values("V4")
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>X</td>
      <td>A</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>X</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>X</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>1</td>
      <td>X</td>
      <td>B</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>X</td>
      <td>C</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>X</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#not sure what this does
#will study it some more
#Use multiple expressions (with DT[,{j}])
```


```python
#Expression chaining using DT[][] (recommended)
df.groupby(['V4'], as_index=False).agg(V1sum=("V1",np.sum)).query("V1sum > 5")
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>V1sum</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>6</td>
    </tr>
    <tr>
      <th>1</th>
      <td>B</td>
      <td>9</td>
    </tr>
    <tr>
      <th>2</th>
      <td>C</td>
      <td>6</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.set_index('V4', drop = False, inplace = True)
df.sort_index(inplace = True)
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>1</td>
      <td>0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>A</th>
      <td>4</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>A</th>
      <td>1</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>B</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>B</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>B</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>C</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>C</th>
      <td>4</td>
      <td>5</td>
      <td>C</td>
    </tr>
    <tr>
      <th>C</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Apply a function on the matching rows
df.loc[['A', 'C'],'V1'].sum() #faster than df.loc[['A', 'C']].V1.sum() as the data fetching is done once
```




    12




```python
#keeping pace with the material
#Modify values for matching rows
df.loc['A', 'V1'] = 0
```


```python
#Use keys in by
df.query("index != 'B'").groupby(level=0).agg({"V1":np.sum})

# or df.loc[~(df.index=="B")].groupby(level=0).agg({"V1":"sum"})
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>0</td>
    </tr>
    <tr>
      <th>C</th>
      <td>6</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Set keys/indices for multiple columns
df.set_index(['V4', 'V1'], drop = False, inplace = True)
df.sort_index(inplace = True)

df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
    <tr>
      <th>V4</th>
      <th>V1</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">A</th>
      <th>0</th>
      <td>0</td>
      <td>0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">B</th>
      <th>1</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">C</th>
      <th>1</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>5</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Subset using multiple keys/indices
df.loc[("C",1)]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
    <tr>
      <th>V4</th>
      <th>V1</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="2" valign="top">C</th>
      <th>1</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Subset using multiple keys/indices
df.loc[(("B","C"),1),:]

#or query
#can be much clearer 
#df.query("V4 in ('B','C') and V1==1")
# you can rename the indices
# to separate it from the columns
# the columns will take precedence if index and columns have the same name
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
    <tr>
      <th>V4</th>
      <th>V1</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>B</th>
      <th>1</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">C</th>
      <th>1</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.query("V4 in ('B','C') and V1==1")
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
    <tr>
      <th>V4</th>
      <th>V1</th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>B</th>
      <th>1</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">C</th>
      <th>1</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#remove keys/indices
df.reset_index(inplace = True, drop = True)
```


```python
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>0</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>7</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
    <tr>
      <th>8</th>
      <td>4</td>
      <td>5</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
#trying to keep pace
df.iloc[0, 1] = 3
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
      <th>V4</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>3</td>
      <td>A</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0</td>
      <td>4</td>
      <td>A</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0</td>
      <td>7</td>
      <td>A</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>5</td>
      <td>B</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>0</td>
      <td>B</td>
    </tr>
    <tr>
      <th>5</th>
      <td>4</td>
      <td>8</td>
      <td>B</td>
    </tr>
    <tr>
      <th>6</th>
      <td>1</td>
      <td>0</td>
      <td>C</td>
    </tr>
    <tr>
      <th>7</th>
      <td>1</td>
      <td>9</td>
      <td>C</td>
    </tr>
    <tr>
      <th>8</th>
      <td>4</td>
      <td>5</td>
      <td>C</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.sort_values(['V4','V1'], ascending = [True, False], inplace = True)
```


```python
df.rename(columns = {'V2':'v2'}, inplace = True)
cols = df.columns.values; cols[1] = 'V2'
df.columns = cols
```


```python
df = df[['V4', 'V1', 'V2']]
```


```python
df
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>0</td>
      <td>3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>0</td>
      <td>4</td>
    </tr>
    <tr>
      <th>2</th>
      <td>A</td>
      <td>0</td>
      <td>7</td>
    </tr>
    <tr>
      <th>4</th>
      <td>B</td>
      <td>4</td>
      <td>0</td>
    </tr>
    <tr>
      <th>5</th>
      <td>B</td>
      <td>4</td>
      <td>8</td>
    </tr>
    <tr>
      <th>3</th>
      <td>B</td>
      <td>1</td>
      <td>5</td>
    </tr>
    <tr>
      <th>8</th>
      <td>C</td>
      <td>4</td>
      <td>5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>C</td>
      <td>1</td>
      <td>0</td>
    </tr>
    <tr>
      <th>7</th>
      <td>C</td>
      <td>1</td>
      <td>9</td>
    </tr>
  </tbody>
</table>
</div>




```python
#Get row number of first (and last) observation by group
pd.DataFrame(df.groupby("V4").indices).melt() #returns all the indices
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>variable</th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>A</td>
      <td>2</td>
    </tr>
    <tr>
      <th>3</th>
      <td>B</td>
      <td>3</td>
    </tr>
    <tr>
      <th>4</th>
      <td>B</td>
      <td>4</td>
    </tr>
    <tr>
      <th>5</th>
      <td>B</td>
      <td>5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>C</td>
      <td>6</td>
    </tr>
    <tr>
      <th>7</th>
      <td>C</td>
      <td>7</td>
    </tr>
    <tr>
      <th>8</th>
      <td>C</td>
      <td>8</td>
    </tr>
  </tbody>
</table>
</div>




```python
pd.DataFrame(df.groupby("V4").indices).loc[1] #get the second row per group
```




    A    1
    B    4
    C    7
    Name: 1, dtype: int64




```python
pd.DataFrame(df.groupby("V4").indices).iloc[[0,-1]].melt() #get first and last indices per group
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>variable</th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>2</td>
    </tr>
    <tr>
      <th>2</th>
      <td>B</td>
      <td>3</td>
    </tr>
    <tr>
      <th>3</th>
      <td>B</td>
      <td>5</td>
    </tr>
    <tr>
      <th>4</th>
      <td>C</td>
      <td>6</td>
    </tr>
    <tr>
      <th>5</th>
      <td>C</td>
      <td>8</td>
    </tr>
  </tbody>
</table>
</div>




```python
df.to_csv("test.csv", index = False)
```


```python
#drop columns when reading csv
pd.read_csv("test.csv", usecols = lambda x: x != "V4")

# or pd.read_csv("test.csv", usecols = lambda x: x not in ["V4"])
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V1</th>
      <th>V2</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0</td>
      <td>4</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0</td>
      <td>7</td>
    </tr>
    <tr>
      <th>3</th>
      <td>4</td>
      <td>0</td>
    </tr>
    <tr>
      <th>4</th>
      <td>4</td>
      <td>8</td>
    </tr>
    <tr>
      <th>5</th>
      <td>1</td>
      <td>5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>4</td>
      <td>5</td>
    </tr>
    <tr>
      <th>7</th>
      <td>1</td>
      <td>0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>1</td>
      <td>9</td>
    </tr>
  </tbody>
</table>
</div>




```python
mdf = df.melt(id_vars = "V4", value_vars=("V1","V2"))
mdf
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>V4</th>
      <th>variable</th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>V1</td>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>V1</td>
      <td>0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>A</td>
      <td>V1</td>
      <td>0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>B</td>
      <td>V1</td>
      <td>4</td>
    </tr>
    <tr>
      <th>4</th>
      <td>B</td>
      <td>V1</td>
      <td>4</td>
    </tr>
    <tr>
      <th>5</th>
      <td>B</td>
      <td>V1</td>
      <td>1</td>
    </tr>
    <tr>
      <th>6</th>
      <td>C</td>
      <td>V1</td>
      <td>4</td>
    </tr>
    <tr>
      <th>7</th>
      <td>C</td>
      <td>V1</td>
      <td>1</td>
    </tr>
    <tr>
      <th>8</th>
      <td>C</td>
      <td>V1</td>
      <td>1</td>
    </tr>
    <tr>
      <th>9</th>
      <td>A</td>
      <td>V2</td>
      <td>3</td>
    </tr>
    <tr>
      <th>10</th>
      <td>A</td>
      <td>V2</td>
      <td>4</td>
    </tr>
    <tr>
      <th>11</th>
      <td>A</td>
      <td>V2</td>
      <td>7</td>
    </tr>
    <tr>
      <th>12</th>
      <td>B</td>
      <td>V2</td>
      <td>0</td>
    </tr>
    <tr>
      <th>13</th>
      <td>B</td>
      <td>V2</td>
      <td>8</td>
    </tr>
    <tr>
      <th>14</th>
      <td>B</td>
      <td>V2</td>
      <td>5</td>
    </tr>
    <tr>
      <th>15</th>
      <td>C</td>
      <td>V2</td>
      <td>5</td>
    </tr>
    <tr>
      <th>16</th>
      <td>C</td>
      <td>V2</td>
      <td>0</td>
    </tr>
    <tr>
      <th>17</th>
      <td>C</td>
      <td>V2</td>
      <td>9</td>
    </tr>
  </tbody>
</table>
</div>




```python
#cast data from long to wide
pd.crosstab(mdf.V4, mdf.variable)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>variable</th>
      <th>V1</th>
      <th>V2</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>3</td>
      <td>3</td>
    </tr>
    <tr>
      <th>B</th>
      <td>3</td>
      <td>3</td>
    </tr>
    <tr>
      <th>C</th>
      <td>3</td>
      <td>3</td>
    </tr>
  </tbody>
</table>
</div>




```python
#cast data from long to wide
pd.crosstab(mdf.V4, mdf.variable, values = mdf.value, aggfunc='sum')
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>variable</th>
      <th>V1</th>
      <th>V2</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>0</td>
      <td>14</td>
    </tr>
    <tr>
      <th>B</th>
      <td>9</td>
      <td>13</td>
    </tr>
    <tr>
      <th>C</th>
      <td>6</td>
      <td>14</td>
    </tr>
  </tbody>
</table>
</div>




```python
#cast data from long to wide
pd.crosstab(mdf.V4, mdf.value > 5)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>value</th>
      <th>False</th>
      <th>True</th>
    </tr>
    <tr>
      <th>V4</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>A</th>
      <td>5</td>
      <td>1</td>
    </tr>
    <tr>
      <th>B</th>
      <td>5</td>
      <td>1</td>
    </tr>
    <tr>
      <th>C</th>
      <td>5</td>
      <td>1</td>
    </tr>
  </tbody>
</table>
</div>




```python

```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for some of the Pandas solutions #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	V1	V2	V3	V4
0	1	1	0.5	A
1	2	2	1.0	B
2	1	3	1.5	C
3	2	4	0.5	A
4	1	5	1.0	B
5	2	6	1.5	C
6	1	7	0.5	A
7	2	8	1.0	B
8	1	9	1.5	C

	V1	V2
0	1.0	0.000000
1	2.0	0.000000
2	1.0	0.000000
3	2.0	2.000000
4	1.0	2.236068
5	2.0	2.449490
6	1.0	2.645751
7	2.0	2.828427
8	1.0	3.000000

	V1	V2
0	2.718282	1.000000
1	54.598150	1.000000
2	2.718282	1.000000
3	54.598150	54.598150
4	2.718282	148.413159
5	54.598150	403.428793
6	2.718282	1096.633158
7	54.598150	2980.957987
8	2.718282	8103.083928

	V1	V2	V4
0	1.0	0.0	A
1	4.0	0.0	B
2	1.0	0.0	C
3	4.0	4.0	A
4	1.0	5.0	B
5	4.0	6.0	C
6	1.0	7.0	A
7	4.0	8.0	B
8	1.0	9.0	C

	V1	V2
0	0.0	-1.0
1	3.0	-1.0
2	0.0	-1.0
3	3.0	3.0
4	0.0	4.0
5	3.0	5.0
6	0.0	6.0
7	3.0	7.0
8	0.0	8.0

	V4	variable	value
0	A	V1	0
1	A	V1	0
2	A	V1	0
3	B	V1	4
4	B	V1	4
5	B	V1	1
6	C	V1	4
7	C	V1	1
8	C	V1	1
9	A	V2	3
10	A	V2	4
11	A	V2	7
12	B	V2	0
13	B	V2	8
14	B	V2	5
15	C	V2	5
16	C	V2	0
17	C	V2	9

	V1	V3
sumV1	13.0	NaN
sdv3	NaN	0.433013

	V4	sumV2
0	A	11
1	B	13
2	C	15

	V4	V1	sumV2
0	A	1	7
1	A	4	4
2	B	1	5
3	B	4	8
4	C	1	9
5	C	4	6

	V4	sumV1
0	a	6
1	b	9
2	c	6

	V1	V2	V3	V4
0	1	1	0.5	A
1	2	2	1.0	B
2	1	3	1.5	C
3	2	4	0.5	A
4	1	5	1.0	B
5	2	6	1.5	C
6	1	7	0.5	A
7	2	8	1.0	B
8	1	9	1.5	C

	V1	V2	V4
0	1.0	0.0	A
1	4.0	0.0	B
2	1.0	0.0	C
3	4.0	4.0	A
4	1.0	5.0	B
5	4.0	6.0	C
6	1.0	7.0	A
7	4.0	8.0	B
8	1.0	9.0	C

	V4	variable	value
0	A	V1	0
1	A	V1	0
2	A	V1	0
3	B	V1	4
4	B	V1	4
5	B	V1	1
6	C	V1	4
7	C	V1	1
8	C	V1	1
9	A	V2	3
10	A	V2	4
11	A	V2	7
12	B	V2	0
13	B	V2	8
14	B	V2	5
15	C	V2	5
16	C	V2	0
17	C	V2	9

Suggestions for some of the Pandas solutions #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

	V1	V2	V3	V4
0	1	1	0.5	A
1	2	2	1.0	B
2	1	3	1.5	C
3	2	4	0.5	A
4	1	5	1.0	B
5	2	6	1.5	C
6	1	7	0.5	A
7	2	8	1.0	B
8	1	9	1.5	C

	V1	V2	V4
0	1.0	0.0	A
1	4.0	0.0	B
2	1.0	0.0	C
3	4.0	4.0	A
4	1.0	5.0	B
5	4.0	6.0	C
6	1.0	7.0	A
7	4.0	8.0	B
8	1.0	9.0	C

	V4	variable	value
0	A	V1	0
1	A	V1	0
2	A	V1	0
3	B	V1	4
4	B	V1	4
5	B	V1	1
6	C	V1	4
7	C	V1	1
8	C	V1	1
9	A	V2	3
10	A	V2	4
11	A	V2	7
12	B	V2	0
13	B	V2	8
14	B	V2	5
15	C	V2	5
16	C	V2	0
17	C	V2	9