One important aspect of using Python is being able to read and write files. To do this you use the open
command:
open(filename-in-quotes, mode)
In place of mode
you can specify:
'r'
, to open a file you want to read'w'
, to open a file that you want to write (starting a new file).'a'
, to open a file that you want to append to (adding to an existing file).So, to open a file with filename file1.txt
in "read" mode:
open('file1.txt', 'r')
To open a file, file2.txt
, in "write" mode:
open('file2.txt', 'w')
CAREFUL: this will create a new file called file2.txt
. If you had an existing file named file2.txt
it will be overwritten.
To open a file, file3.txt
, in 'append' mode:
open('file3.txt', 'a')
If you don't specify the mode, the default is 'r'
.
When we open a file in 'read' mode, we will also want to be able to access what has been read.
To start with, we will create an opened file object and set it as a new variable, f
. (You could change f
to whatever you want!):
f = open('datafiles/file1.txt', 'r')
The file object, f
, is an iterator, that is we can iterate through the file object (using a for
loop) to obtain each new line in the file. We can then perform some operation on each line of the file.
For example, if we just want to print each line in the file:
for line in f:
print(line)
Open the file, file1.txt
, in a text editor and check that the file indeed contains the lines we printed above.
Once we have iterated once through the file object, f, it is empty, so if we do the same thing again, nothing will happen:
for line in f:
print(line)
If we want to read the lines again, we need to close the file object, using f.close()
, and reopen the file:
f.close()
f = open("datafiles/file1.txt", "r")
Now we can read the lines in the file again:
for line in f:
print(line)
Let's close the file object again, so we don't have it hanging around!
f.close()
If you only wanted to read through the contents of the file once, this would be fine, but it is quite a nuisance if you want to refer back to the lines in the file later on.
Instead, we can convert the file object to a list, which then gives us a list of the lines in the file, that we can go back to:
# First open the file object
file_object = open("datafiles/file1.txt", "r")
# Then we convert it to a list
file_list = list(file_object)
# Let's print the contents of the list
print(file_list)
# Finally we have to make sure that the file object is closed
file_object.close()
Note that:
for line in file_list:
print(line)
You can also use the readlines()
method function:
# First open the file object
file_object = open("datafiles/file1.txt","r")
# Then we use the readlines method to get all the lines
file_list2 = file_object.readlines()
# We close the file object for safety
file_object.close()
# We print the extracted list
print(file_list2)
Let's put that into practise with an exercise:
a) Print out just the first line in the file
# Exercise 1 a)
b) Print out all except the first line in the file
# Exercise 1 b)
c) Change the third peptide sequence from 'PLSDMASI' to 'PLSEMASI'
# Exercise 1 c)
d) Add a new peptide sequence (YYVHNKSERFT) to the end of file_list
# Exercise 1 d)
e) Now print all the peptide sequences again (without the first line)
# Exercise 1 e)
To write the updated list of peptides to a new file we would need to open a new file, using "write" mode, and then use the write
method fuction.
The following example opens a new file, file2.txt
(as mentioned above, we need to be careful that this file doesn't already exist, otherwise it will be overwritten), then writes three lines to the new file. Notice that the argument to the write
function is a single string. We make use of the new line character \n
as part of the string to make separate lines.
file_object_2 = open('file2.txt', 'w')
file_object_2.write('In this new file\nthis is the second line\n')
Note: the write function returns the number of characters written to the file.
If I want to write more to the file I can do so by using the write
function again:
file_object_2.write('and this is the third line\n')
Once finished writing to the file, we should close the file, using close
as before.
file_object_2.close()
We can now no longer write to file_object_2
:
# NBVAL_RAISES_EXCEPTION
## Note: ignore the above comment, this exists to allow us to test the notebook
file_object_2.write('another line\n')
Open file2.txt
in a text editor to check that the lines we wrote to the file are there.
If we now want to append a fourth line to file2.txt
, we can open the file again, this time in "append' mode.
The following opens file2.txt
in append mode, writes a third line to that file, then closes the file object.
file_object_3 = open('file2.txt', 'a')
file_object_3.write('this line will be appended\n')
file_object_3.close()
You can check file2.txt
again (by opening in a text editor)
Going back to our file of peptide sequences (file1.txt
): write the updated list of peptide sequence lines to a new file called peptides.txt
. Check your answer by reading from peptides.txt
.
# Exercise 2
If you don't want to have to remember to close the file when you're done with it, you can work with your file inside a with
statement. Your file will be automatically closed at the end of the with
statement. The technical term you may see for this type of usage is using a file as a 'context manager'.
with open('file3.txt', 'w') as out_file:
# within this statement, the file is open
out_file.write('test')
If we try to write outside the context manager, python will give us an error as the file is no longer opened:
# NBVAL_RAISES_EXCEPTION
## Note: ignore the above comment, this exists to allow us to test the notebook
with open('file3.txt', 'w') as out_file:
# within this statement, the file is open
out_file.write('test')
# now the file is closed
out_file.write('test')
In this section we have learned the following:
open
('filename', 'mode').with
statement.