Using Gzip for Storage Optimisation in Large CSV Data Sets
Working with CSV files can be a hassle, especially when the files are large. One way to make the process easier is to compress the files using gzip, which can significantly reduce the file size.
In this post, I’ll show you how to work with CSV.gzip files using Python and how you can decompress them through the command line interface so they can be opened in an application such as Excel.
Working with CSV.gzip files in Python
First, you’ll need to import the gzip
module and the csv
module. You can do this by
running the following code:
import gzip
import csv
Next, you’ll need to open the gzipped CSV file. You can do this using the gzip.open()
function, which works just like the built-in open()
function, but automatically
decompresses the file. Here’s an example:
with gzip.open('data.csv.gz', 'rt') as f:
reader = csv.reader(f)
for row in reader:
print(row)
In this example, we’re using the with statement to open the file data.csv.gz
in read mode.
The rt
mode stands for “text mode,” which tells the gzip.open()
function to decompress the
file and return it as a text file. The csv.reader()
function is then used to read the
decompressed file and return a reader object that can be iterated over to read the rows of
the CSV file.
It is also possible to write data to csv.gzip file, you can do this by using the
gzip.open()
function in write mode. Here’s an example:
with gzip.open('data.csv.gz', 'wt') as f:
writer = csv.writer(f)
writer.writerow(['Ticker', 'Price', 'P/E Ratio'])
writer.writerow(['TSLA', 143.00, 44.33])
writer.writerow(['AAPL', 140.30, 23.32])
In this example, we’re using the with
statement to open the file data.csv.gz
in write mode.
The wt
mode stands for “text mode,” which tells the gzip.open()
function to compress the
file and return it as a text file. The csv.writer()
function is then used to write the
data and return a writer object that can be used to write the rows of the CSV file.
Working with CSV.gzip files in Python is a great way to save space and make your data
processing tasks more efficient. With the gzip
and csv
modules, you can easily read and
write compressed CSV files with minimal code.
How to decompress a CSV.gzip file using the CLI
You can decompress a CSV.gzip file using the command line interface (CLI) by using the
gunzip
command. The gunzip
command is used to decompress files that have been compressed
with the gzip
command. Here’s an example of how to use the gunzip
command to decompress
a CSV.gzip file:
gunzip data.csv.gz
This command will decompress the file data.csv.gz
and create a new file named data.csv
.
You can then open the data.csv
file in Excel.
Alternatively, you can also use zcat
command:
zcat data.csv.gz > data.csv
This command will decompress the file data.csv.gz
and creates a new file named data.csv
and pipe the output to the new file.
If you don’t have the gunzip
or zcat
command installed, you can install it using your
package manager, such as apt or yum.
Once the command is run, you will have the decompressed file data.csv
which you can open
in excel and work with it as you would normally do with a csv file.
Enjoyed This Post?
Other Posts You May Like