FastAI : Reduce your spend on AWS.
AWS has tutorials on spot instances. Spot instances are cheaper than on demand access, a bidding process results in savings. The margin saved obviously varies with location, dates and time.
Much of data science requires lengthly model building to code, debug and train models. Most of the coding time can be done on low powered hardware, even laptops, while model training requires more powerful hardware.
One method to reduce AWS costs is to resize the AWS instance.
From console > EC2 instances > select AWS instance, stop.
Wait for the AWS instance to stop, while it is stopping ‘change instance type’ is unavailable.
When stopped, change instance type will be available.
We see this popup.
Some digging in AWS documentation found these references.
- Amazon EC2 Instance Types
- Selecting the Instance Type for DLAMI
- Recommended CPU Instances
- AWS Simple Monthly Calculator (good for more detailed budgetting)
- ec2instance.info (preferred for this simple exercise)
Looking at the spreadsheet we see p2.xlarge at USD0.90/hr and c5.large at USD0.085/hr.
Now we see our instance has instance type C5.large, restarting the instance.
NB: record your instance ID and use tags wisely, otherwise bad mistakes happen. AWS also has option to restrict ability to delete instances in IAM user management and option to provide alerts if instances are stopped or deleted.
restarting our instance, ssh in and initialise jupyter notebook.
ssh -i /path/to/your/password.pem -L 9999:127.0.0.1:8888 ec2-user@xxx.xxx.xxx.xxx
jupyter notebook
#then open in browser
https://localhost:9999
Some blocks of code will run on the ‘smaller’ AWS instance. Others may experience this error.
Weird and interesting things can happen with runtimes also. While we expect runtimes on ‘lighter’ hardware configurations, deep learning can result in different runtimes for the same activity on the same hardware. This variance can overlap between high and low hardware configurations.
Obviously a few steps in the deep learning process will take much longer than others. We can save our models to files and reload these models to avoid having to run the entire load data > build model > train model > predict from model sequence.
learn.save('224_lastlayer')
learn.load('224_lastlayer')
In the lesson 1 notebook, the model files are stored in
/home/ec2-user/fastai/courses/dl1/data/dogscats/models
This is the result of the fast.ai libraries creating a standardised directory structure for building/training models.
data = ImageClassifierData.from_paths(PATH, tfms=tfms)
learn = ConvLearner.pretrained(arch, data, precompute=True)