sort_naturally
¶import pandas_flavor as pf
import pandas as pd
import janitor
Let's say we have a pandas DataFrame that contains wells that we need to sort alphanumerically.
data = {
"Well": ["A21", "A3", "A21", "B2", "B51", "B12"],
"Value": [1, 2, 13, 3, 4, 7],
}
df = pd.DataFrame(data)
df
Well | Value | |
---|---|---|
0 | A21 | 1 |
1 | A3 | 2 |
2 | A21 | 13 |
3 | B2 | 3 |
4 | B51 | 4 |
5 | B12 | 7 |
A human would sort it in the order:
A3, A21, A21, B2, B12, B51
However, default sorting in pandas
doesn't allow that:
df.sort_values("Well")
Well | Value | |
---|---|---|
0 | A21 | 1 |
2 | A21 | 13 |
1 | A3 | 2 |
5 | B12 | 7 |
3 | B2 | 3 |
4 | B51 | 4 |
Lexiographic sorting doesn't get us to where we want. A12 shouldn't come before A3, and B11 shouldn't come before B2. How might we fix this?
df.sort_naturally("Well")
Well | Value | |
---|---|---|
1 | A3 | 2 |
0 | A21 | 1 |
2 | A21 | 13 |
3 | B2 | 3 |
5 | B12 | 7 |
4 | B51 | 4 |
Now we're in sorting bliss! :)