Python file operations (1)

Python file operations (1)

·

5 min read

Why?

So, a few days ago I ended up with the need to parse multiple lines of text. It contained around 7000 lines of text and I needed to clean it and store the refined text into a new file. I will not show the exact text file because it is confidential data handed over for the parsing job by a friend. Instead, we will look at this with similar examples.

Coding time

That's it for a little intro to the problem I faced and all. Let's get right into coding! Take a look at the contents of this file I have below :

Ahmed:Ali xxxxxxxxxxxx 
Ibrahim:said xxxxxxxxxxxx 
John:Alex xxxxxxxxxxxx 
Lionel:Messi xxxxxxxxxxxx 
Davis:Rodrigues xxxxxxxxxxxx

As you can see there is a bit of extra trailing text that is being repeated for each line. If you count them out it comes to about exactly 12 characters in length. In a real-life scenario, this can be something like a date format. Now imagine that the file has around 7000 lines and I need to remove these 12 characters. It would be a quite hefty task to do manually, but of course, we have python at our disposal.

To start off, create a new python file and import the file containing the values you need to clean.

 linesAll = open("infos.txt","r")

Above we are using the open() function that will open the file and return to us the file as an object. The open method takes two parameters, first being the file name (if its in the same directory as your .py file or the path if it's somewhere else) and the mode. See the syntax below :

open("file name","mode")

Let's dwelve a bit more into these modes, I won't be explaining them all as I believe they are self-explanatory:

ModeDescription
rread - This is also the default parameter, so you dont really need to even specify it as so.
aappend - Opens a file for appending or creates it if it doesn't exist.
wwrite - This will open a file for writing or creates it if it doesn't exist.
xcreate - To create the specified file, an error will be returned if file doesn't exist

Now there are some more parameters but we shall look into them in a later tutorial. The difference between the w and a mode is that the former will overwrite the file while the latter starts from the end of the file and adds to the existing file.

We will now move onto performing the manipulations. As mentioned before the trailing test contains 12 characters. So our approach here will be as listed below :

  • Convert all lines into an array :
lines_arr = linesAll.read().split('\n')
  • Now that we have an array we will use a for loop to access each element one by one and trim the last 12 characters away :
for line in lines_arr:
 line_length = len(line) # Get the length
  new_line_length = line_length-12 # new length
  refined_line = c[0:new_line_length]
  print(refined_line) 
  _refined.write(refined_line + "\n") #Use the /n escape character to get a new line after each line has been put in new file

Our code will be able to clean the extra 12 characters off from our list. However, there is some important things to keep in mind. Once we are done performing operations with a file it is necessary to close the file properly. We need to free up the resources that the file was consuming. To close a file in python we will use the below code :

linesAll.close()

However, if there is an exception that occurs while we are doing operations on the file our program will exit without closing the file. In order to prevent such mishaps, we will use a try..finally block which will ensure that the file is closed even after an exception has been thrown. We will explain try...finally blocks in a future tutorial.

try:
   #All the other code relating to file operations
finally:
 linesAll.close() #Close the file

Now, remember that I said the open() method returns an object of the file. Well, in our next tutorial we will look at other methods relating to the opened file. So far we have learned about the close method only. However, there are a few more methods that haven't been covered. So stay tuned!