Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
We first need to import pandas
module, with the optional as pd
(or whatever name you'd like to call by) that declares the alias.
import pandas as pd
Now you will notice that the following modules go together with you whenever you do something in this class.
import numpy as np
import matplotlib.pyplot as plt
In this lecture, we will access data files which are uploaded on the Google drive. In order to do this, we will have to mount the Goodle drive first.
Run the following cell. Then you will be presented with a link and will be asked to enter your authorization code.
Click on the link to log in again with your INHA account, which you are working with. Then you will be presented with the authorization code.
Copy the authorization code, and paste it into the blank.
Note that this step is necessary ONLY when you work on Google Colab environment.
import os
from google.colab import drive
drive.mount('/content/drive')
os.chdir('/content/drive/My Drive/Colab Notebooks/ase3001')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Data frames are the central concept in pandas. In essence, a data frame is table with labeled rows and columns. Data frames can be created from multiple sources - e.g. CSV files, excel files, and JSON.
csv
files¶df = pd.read_csv('kfxsim.csv', delimiter=',')
Let's first check the size of your file and look at the first 10 rows.
print(df.shape) # show the size of the dataframe
df.head(10) # show the first 10 rows
(2001, 21)
Time | Position_N | Position_E | Position_D | Flight path angle | Heading angle | Total angle of attack | Bank angle | Ground speed | Lateral acceleration (command) | ... | Bank angle (command) | Bank angle (response) | Drag force | Thrust force | Gravitational acceleration | Yaw | Pitch | Roll | Elevation | Azimuth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00 | 0.000000 | 0.000000 | -1900.000000 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 250.000000 | 2.549291 | ... | -2.870026 | 0.000000 | 95310.78504 | 294359.3750 | 9.813746 | 2.000000 | 0.000000 | 0.000000 | 10.764371 | -2.000000 |
1 | 0.01 | 2.499138 | 0.087272 | -1899.999509 | -0.022467 | 2.000000 | 0.011393 | -0.013670 | 250.132068 | 2.549291 | ... | -2.863572 | -0.013670 | 95411.51663 | 292512.7173 | 9.813746 | 1.999997 | -0.011074 | -0.013670 | 10.755464 | -2.003018 |
2 | 0.02 | 4.999589 | 0.174590 | -1899.998039 | -0.044817 | 2.000000 | 0.044679 | -0.052123 | 250.262882 | 2.549291 | ... | -2.857154 | -0.052123 | 95511.35308 | 290680.6389 | 9.813746 | 1.999959 | -0.000138 | -0.052123 | 10.767726 | -2.010860 |
3 | 0.03 | 7.501341 | 0.261953 | -1899.995597 | -0.066948 | 1.999999 | 0.098546 | -0.111704 | 250.392451 | 2.549291 | ... | -2.850798 | -0.111704 | 95610.30042 | 288863.0420 | 9.813746 | 1.999807 | 0.031598 | -0.111704 | 10.800058 | -2.022806 |
4 | 0.04 | 10.004380 | 0.349361 | -1899.992192 | -0.088768 | 1.999998 | 0.171725 | -0.189039 | 250.520784 | 2.549291 | ... | -2.844532 | -0.189039 | 95708.36436 | 287059.8356 | 9.813746 | 1.999431 | 0.082956 | -0.189039 | 10.851390 | -2.038191 |
5 | 0.05 | 12.508694 | 0.436813 | -1899.987839 | -0.110189 | 1.999994 | 0.262988 | -0.281028 | 250.647891 | 2.549291 | ... | -2.838376 | -0.281028 | 95805.55029 | 285270.9365 | 9.813746 | 1.998704 | 0.152796 | -0.281028 | 10.920673 | -2.056411 |
6 | 0.06 | 15.014271 | 0.524309 | -1899.982557 | -0.131129 | 1.999985 | 0.371148 | -0.384842 | 250.773781 | 2.549291 | ... | -2.832350 | -0.384842 | 95901.86324 | 283496.2695 | 9.813746 | 1.997492 | 0.240010 | -0.384844 | 11.006891 | -2.076919 |
7 | 0.07 | 17.521097 | 0.611849 | -1899.976368 | -0.151512 | 1.999969 | 0.495060 | -0.497913 | 250.898461 | 2.549291 | ... | -2.826471 | -0.497913 | 95997.30790 | 281735.7675 | 9.813746 | 1.995667 | 0.343529 | -0.497920 | 11.109052 | -2.099221 |
8 | 0.08 | 20.029161 | 0.699430 | -1899.969297 | -0.171268 | 1.999943 | 0.633618 | -0.617921 | 251.021940 | 2.549291 | ... | -2.820754 | -0.617921 | 96091.88863 | 279989.3716 | 9.813746 | 1.993110 | 0.462313 | -0.617939 | 11.226194 | -2.122873 |
9 | 0.09 | 22.538451 | 0.787053 | -1899.961371 | -0.190330 | 1.999904 | 0.785756 | -0.742790 | 251.144224 | 2.549291 | ... | -2.815211 | -0.742790 | 96185.60944 | 278257.0309 | 9.813746 | 1.989717 | 0.595360 | -0.742826 | 11.357385 | -2.147483 |
10 rows × 21 columns
What does the dataframe contain? You can list all the headers to get the idea.
list(df)
['Time', 'Position_N', 'Position_E', 'Position_D', 'Flight path angle', 'Heading angle', 'Total angle of attack', 'Bank angle', 'Ground speed', 'Lateral acceleration (command)', 'Lateral acceleration (response)', 'Bank angle (command)', 'Bank angle (response)', 'Drag force', 'Thrust force', 'Gravitational acceleration', 'Yaw', 'Pitch', 'Roll', 'Elevation', 'Azimuth']
You can choose some of them and draw plots.
plt.figure()
plt.plot(df['Time'], df['Yaw'])
plt.plot(df['Time'], df['Pitch'])
plt.plot(df['Time'], df['Roll'])
plt.grid(True)
plt.legend()
plt.xlabel("Time (s)")
plt.ylabel("Euler angles (deg)")
plt.title("Euler angles")
WARNING:matplotlib.legend:No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
Text(0.5, 1.0, 'Euler angles')
When the data file you are interested in is not on your local machine but on the Internet, you can do something like this.
df = pd.read_csv('https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data', delimiter='\t')
df
AGE | SEX | BMI | BP | S1 | S2 | S3 | S4 | S5 | S6 | Y | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 59 | 2 | 32.1 | 101.00 | 157 | 93.2 | 38.0 | 4.00 | 4.8598 | 87 | 151 |
1 | 48 | 1 | 21.6 | 87.00 | 183 | 103.2 | 70.0 | 3.00 | 3.8918 | 69 | 75 |
2 | 72 | 2 | 30.5 | 93.00 | 156 | 93.6 | 41.0 | 4.00 | 4.6728 | 85 | 141 |
3 | 24 | 1 | 25.3 | 84.00 | 198 | 131.4 | 40.0 | 5.00 | 4.8903 | 89 | 206 |
4 | 50 | 1 | 23.0 | 101.00 | 192 | 125.4 | 52.0 | 4.00 | 4.2905 | 80 | 135 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
437 | 60 | 2 | 28.2 | 112.00 | 185 | 113.8 | 42.0 | 4.00 | 4.9836 | 93 | 178 |
438 | 47 | 2 | 24.9 | 75.00 | 225 | 166.0 | 42.0 | 5.00 | 4.4427 | 102 | 104 |
439 | 60 | 2 | 24.9 | 99.67 | 162 | 106.6 | 43.0 | 3.77 | 4.1271 | 95 | 132 |
440 | 36 | 1 | 30.0 | 95.00 | 201 | 125.2 | 42.0 | 4.79 | 5.1299 | 85 | 220 |
441 | 36 | 1 | 19.6 | 71.00 | 250 | 133.2 | 97.0 | 3.00 | 4.5951 | 92 | 57 |
442 rows × 11 columns
These data consist of observations on 442 patients, with the response of interest being a quantitative measure of disease progression one year after baseline, Y
in
the last column.
There are ten baseline variables---age, sex, body-mass index (BMI), average blood pressure (BP), and six blood serum measurements (S1-S6).
For example, the youngest patient's record can be easily found by,
df.loc[df['AGE'].idxmin()]
26 | |
---|---|
AGE | 19.0000 |
SEX | 1.0000 |
BMI | 19.2000 |
BP | 87.0000 |
S1 | 124.0000 |
S2 | 54.0000 |
S3 | 57.0000 |
S4 | 2.0000 |
S5 | 4.1744 |
S6 | 90.0000 |
Y | 137.0000 |
Or the average of the BMI over all patients is
df['BMI'].mean()
26.37579185520362
or
np.mean(df['BMI'])
26.37579185520362
The correlation between the blood pressure and the body mass index is displayed by
plt.figure(figsize=(12,8))
plt.plot(df['BP'], df['BMI'], 'o', alpha=0.5)
plt.xlabel('BP')
plt.ylabel('BMI')
plt.grid(True)
json
files¶json
files can also be easily imported.
df = pd.read_json('http://jonghank.github.io/ee370/files/speeches.json')
df
speeches | parties | |
---|---|---|
0 | Mr. President, I wanted to follow up the remar... | R |
1 | Mr. President, I rise today to address a recen... | D |
2 | Mr. President, I rise to draw a line — a line ... | D |
3 | Mr. President, I rise to condemn in the strong... | D |
4 | Mr. President, I thank my colleague and my fri... | R |
... | ... | ... |
3114 | Mr. President, I wish to speak about two separ... | R |
3115 | Mr. President, I am here to talk about two sep... | R |
3116 | Mr. President, tomorrow I will be visiting the... | R |
3117 | Madam President, there is an ongoing debate in... | R |
3118 | I thank the Senator from Arizona.\nI want to p... | R |
3119 rows × 2 columns
This json
file contains 3118 speeches from politicians from two different parties, Republicans (R) and Democrats (D).
If you look inside the first speech, you will see,
print(f"parties: {df['parties'][0]}")
print(f"speeches: {df['speeches'][0]}")
parties: R speeches: Mr. President, I wanted to follow up the remarks of my senior Senator from Pennsylvania [Mr. Specter], and talk about the problems that we are having in Pennsylvania today. The first thing I wanted to do was make sure the record is very clear in my use of the word \"liberal.\" I suggested that FEMA be more liberal than what they have been to date, as of early this morning, in declaring counties in Pennsylvania eligible for individual assistance, for emergency disaster relief funds. I think that was an appropriate call given the fact that the Governor of Pennsylvania, who knows a little bit about the Emergency Relief Act that is in place here because he helped write it several years ago and knows it cover to cover, declared 58 of Pennsylvania's 67 counties disaster areas and was seeking Federal grant recognition for, if not all, certainly a great majority of those counties. Senator Specter, I know, has been traveling the State extensively, as have I. We have seen the tremendous damage done by this heavy snowfall and subsequent quick melting and floods and then freezing again, causing ice jams and horrible damage on our Commonwealth's rivers and streams. We do believe that several more counties should be included in the list that are eligible for individual assistance, and obviously the process will commence to determine whether those counties and municipalities will be eligible for public assistance, for reimbursing municipalities and counties for the cost of cleanup and dealing with the problems of this horrible storm. I understand that the senior Senator has already talked about how today James Lee Witt, the head of FEMA, has been up to the State of Pennsylvania and he has added to the list of 6 counties an additional 19 counties, bringing to 25 the number of counties that will now be eligible for some assistance. We were in Harrisburg this morning. I know he mentioned we saw some of That has really made this disaster a lot different because Harrisburg was hit back in 1972 with very severe flooding as a result of Hurricane Agnes. In fact, the mayor and others have been telling us that while the flood levels were not as high as Hurricane Agnes, although in some areas they were almost as high, the damage, they believe, actually will be more because of the ice. Literally, Senator Specter and I were walking around an area that was 5 feet underwater just 24 hours before, and sitting there all over the place were boulders of ice almost my size and probably bigger, with trees frozen to them. It was really a rather gruesome picture. You could actually see the water level because on the houses and the fences and on the trees you could see where the ice had frozen around the tree, around the houses, sort of jutting out from the houses. So you could pretty well tell everywhere where the water levels had risen to. We were through that area and saw the damage that the ice had caused to streets and to houses, the buckling effect of having water there and then freezing and then unfreezing. It looks almost like an earthquake on some of the roads; they are just sort of warped, with big sinkholes and things like that as a result of this freezing and thawing and freezing again and the amount of water pressure. In fact, Senator Specter and I met with Mayor Reed of Harrisburg, whom I have to commend; he has done a tremendous job in rallying the troops in Harrisburg, one of our hardest hit cities, and is doing an outstanding job personally. He is someone whom I have known for quite some time and know he puts every ounce of his person in his job. I am sure he has not slept for days. He met us in boots and blue jeans and looked like he had not been able to get into his house, probably even to eat a meal, in a few days. He has really just been on the go. They had a horrible fire in this area I was talking about that was 5 feet under water. They had, unfortunately, a fire break out last night that destroyed four historic town homes. And luckily no one was injured. The area was evacuated obviously and no one was injured as a resident. But several of the firefighters, they had to cut their way through the ice and wade through water, waist high at that time, and fight the fire without obviously any fire hoses. They had to string them literally blocks to get fire hoses there. My understanding is that a dozen firefighters were carried from the scene with hypothermia — a horrible situation. I know Mayor Reed was there the entire time working on it. He showed us the Walnut Street bridge, which is the oldest — I am not going to get this right — it is the oldest of some type of bridge having to do with metal construction. That bridge was expected to collapse during the 1972 flood when actually the river went up over the platform of the bridge. In this case it was several feet below it. But a section of the bridge — you may have seen on television — was knocked away. The reason was not because of the water flow. Again, it was the ice jams. An ice jam had a large amount of ice collected at this one abutment, and eventually with all the pressure it was knocked over, was knocked into the river. They expect another one of those pillars to fall relatively soon. So there has been a severe amount of damage. Senator Specter and I are very concerned about the Federal response to the damage across Pennsylvania. We believe that in some instances the response was delayed. I know the President would like to see all the people and communities that have been severely hurt by this storm to get the kind of assistance that they need to begin to clean up and rebuild their lives. I am hopeful that we can move forward. As Senator Specter said, initially only six counties were listed as qualifying for this assistance. One of the counties that did not qualify originally, and did not qualify until this afternoon, was a county where there were 6 people known dead, 75 people missing from an area that was a large housing development that was literally just swept away. Water rose rapidly. People were given no warning. The consequences were terrible. Yet that county was not listed originally on the disaster list, which amazed many of us and frankly was very discouraging. I had occasion to talk to people up in Williamsport, Lycoming County. And they were very discouraged. Somehow they were suffering to this degree, and in fact accounted, from my understanding, for over half the deaths related to this storm in the Northeast, and yet were not listed as a county eligible for disaster assistance. That caused some legitimate uneasiness to where actually their needs and concerns were being paid attention to. I am happy to report they were listed in the second round. There are other counties that we need to look at that I believe have legitimate needs to be met. Hopefully we can do that, we can do that expeditiously. I want to join Senator Specter in congratulating Secretary Pena and Director Witt for being up in Pennsylvania today to survey the damage, to see the extent of what seemed to be just a flood. I remind you the compounding effect of the ice is something I do not think anyone recognized. I was in Lancaster County, which unfortunately has yet to be declared a disaster county. I was in Marietta which was flooded, at least the parts nearest the river were flooded. Their big concern right now is the freezing that is going on. They were flooded. They have something like a dike. It is actually a railroad track that runs between the river and the town that is very high up and serves like a dike. But they got flooded through their storm sewers, and the water reaching its level filled up both sides of the dike. Now they are concerned with the storm sewers. Because of the very cold temperatures, they are now frozen. If they get any more rain, which is anticipated tomorrow, or any other precipitation, they will have the same problem all over again. Many counties and many cities, they have that same problem with either frozen surface areas that prevent water from draining or the infrastructure underneath the ground itself containing ice and frozen debris is going to cause a real problem with drainage. So we are not out of the woods yet. There is unfortunately still a lot of snow on the ground. The possibility exists, with the warm weather today, we could even see some more problems. So I want to congratulate Governor Ridge and Lt. Gov. Mark Schweiker for their tremendous role in responding to this emergency. They have been all over the State, have been very aggressive in trying to seek aid, and have also been very aggressive in trying to help municipalities trying to deal with the problems that have beset them. I think we have seen a very good effort on the part of locally elected officials, and the Governor and Lieutenant Governor. I think — at least I hope that we can be proud of the Federal role that is being played in Pennsylvania. I think we are coming along a little slowly, but maybe today with some fly-arounds and other things that are going on, we can impress upon officials here in Washington and in the regional office that this is a true emergency, a disaster that needs to be attended to, and the Federal Government has a role to play in helping those individuals and municipalities that were affected by it. Mr. President, I suggest the absence of a quorum.
The code below loads a png
image and saves it to a three-dimensional array, khu
.
from PIL import Image
import requests
X = Image.open(requests.get("https://jonghank.github.io/ase3001/files/aerospace_building.png", stream=True).raw)
ae_building = np.array(X)/255
plt.figure(dpi=100)
plt.imshow(ae_building)
plt.axis('off')
plt.title('Original color image')
ae_building.shape
(640, 976, 4)
The first two sizes correspond to the image's size, and the last size corresponds to the four channels, the R/G/B, and the Alpha channel.
The following transforms the original color image to a grayscale one. An easy way to generate a grayscale image from RGB channels is, for each pixel assigning,
$$ G = 0.299 R + 0.587 G + 0.114 B $$R_channel = ae_building[:,:,0]
G_channel = ae_building[:,:,1]
B_channel = ae_building[:,:,2]
ae_grayscale = 0.299*R_channel + 0.587*G_channel + 0.114*B_channel
plt.figure(dpi=100)
plt.imshow(ae_grayscale, cmap='gray')
plt.axis('off')
plt.title('Grayscale image')
Text(0.5, 1.0, 'Grayscale image')