Fast.AI Lesson 1 : data exploration
so we got the notebook running and downloaded the zip file.
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
The notebook skipped a step in unpacking and exploring the data. I like using less to explore the contents of the zip file, easier to scroll up / down.
less dogscats.zipunzip -l dogscats.zip
Since we will be using various datasets during the fastai course, we will create a data directory to store multiple datasets.
#cwd = ~/fastai/courses/dl1
mkdir data
#unzip to this dir
unzip dogscats.zip -d data/
#we will use tree to look at the files
#the AWS ami uses yum instead of apt-get
sudo yum install tree
tree data
Thankfully the zip file was packaged with a split for sample, test, train, validation. This helps fast tracking the group training exercise, data scientists often don’t have this luxury.
#quick file count to see breakup of data (or explore on your own)
find data -type f | wc -l
find data/dogscats/ -type f | wc -l
find data/dogscats/sample/ -type f | wc -l
find data/dogscats/test1/ -type f | wc -l
find data/dogscats/valid/ -type f | wc -l
and now the notebook can see the files, same as in the fast.ai video.
os.listdir(PATH)
we can run the analysis. Code to record the duration added.
(run on an AWS p2.xlarge)
start_time = timeit.default_timer()
print ("elapsed time :", timeit.default_timer() - start_time)
Rest of the notebook runs with minimal problems.