Fast.AI Lesson 1 : data exploration
so we got the notebook running and downloaded the zip file.
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
The notebook skipped a step in unpacking and exploring the data. I like using less to explore the contents of the zip file, easier to scroll up / down.
less dogscats.zipunzip -l
Since we will be using various datasets during the fastai course, we will create a data directory to store multiple datasets.
#cwd = ~/fastai/courses/dl1
mkdir data
#unzip to this dir
unzip -d data/
#we will use tree to look at the files
#the AWS ami uses yum instead of apt-get
sudo yum install tree
tree data
Thankfully the zip file was packaged with a split for sample, test, train, validation. This helps fast tracking the group training exercise, data scientists often don’t have this luxury.
#quick file count to see breakup of data (or explore on your own)
find data -type f | wc -l
find data/dogscats/ -type f | wc -l
find data/dogscats/sample/ -type f | wc -l
find data/dogscats/test1/ -type f | wc -l
find data/dogscats/valid/ -type f | wc -l
and now the notebook can see the files, same as in the video.
we can run the analysis. Code to record the duration added.
(run on an AWS p2.xlarge)
start_time = timeit.default_timer()
print ("elapsed time :", timeit.default_timer() - start_time)
Rest of the notebook runs with minimal problems.