We're going to use , a dataset of labeled images of cats and dogs created from . For this guide you don't need to download it directly, because we'll load it in right from our notebook.
Don't like cats and dogs? You can also use any classification from the ! Maybe try to classify bears vs cats!
Import CSV Into Pandas Dataframe
We can begin by importing the fastai library, pandas, and our udt.csv file.
from fastai.vision import *
import pandas as pd
url_to_csv = "https://raw.githubusercontent.com/UniversalDataTool/udt-dataset-cats-and-dogs/master/coco_dogs_and_cats.udt.csv"
udt_csv = pd.read_csv(url_to_csv)
You can use the udt.json format too, tables are just a nice way to visualize the data!
Download Images
# Get the lines of our CSV that have sample data
samples = udt_csv[udt_csv["path"].str.contains("samples.")]
# Create two csvs that just have
# our cat image urls and dog image curls
dog_samples = samples[samples["annotation"] == "dog"]
cat_samples = samples[samples["annotation"] == "cat"]
dog_samples.to_csv("dog_urls.csv", columns=["imageUrl"], header=False, index=False)
cat_samples.to_csv("cat_urls.csv", columns=["imageUrl"], header=False, index=False)
# Now we can download the images!
download_images("dog_urls.csv", "images/dog" , max_pics=500)
download_images("cat_urls.csv", "images/cat" , max_pics=500)
# Let's make sure all the images are readable
verify_images("images/dog", delete=True, max_size=500)
verify_images("images/cat", delete=True, max_size=500)
Create an ImageDataBunch
From here, everything should should seem pretty normal. We can create an ImageDataBunch from our images directory.
data = ImageDataBunch.from_folder("images", train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
# Let's take a look at the data
data.show_batch(rows=3, figsize=(7,8))
Train a Model
We can now train a model! This is just a simple one, don't forget to fine tune!