Rock, Paper, Scissors ML in Python
Here’s a breakdown of the thinking process to generate the Python script and explanation for training a Rock-Paper-Scissors hand shape classifier:
Understand the Goal: To train a model in Python to recognize three hand shapes (Rock, Paper, Scissors). This is fundamentally an image classification task. The core goal is to map image data (representing hand shapes) to one of three labels.
Identify Key Components of an ML Project:
Data: Need images of Rock, Paper, and Scissors. How to get them? Where to store them? How to load them?
Preprocessing: Images need to be standardized (size, color channels, normalization).
Model Architecture: What kind of model is suitable for image classification? Convolutional Neural Networks (CNNs) are the standard choice. Which specific architecture? Start simple (a few Conv/Pool layers) or use a pre-trained model (Transfer Learning)? Given the simplicity (3 classes, distinct shapes), a custom CNN is feasible and good for learning. Transfer learning might be overkill but is an option.
Training Loop: Need to feed data to the model, calculate loss, compute gradients, update weights (using an optimizer).
Evaluation: How to measure performance? Accuracy is a good starting metric for balanced classes. Need a separate test/validation set.
Libraries: What Python libraries are needed?
tensorflow or pytorch for building and training the neural network. TensorFlow/Keras is often considered slightly more beginner-friendly for standard CNNs.
numpy for numerical operations (handling image arrays).
matplotlib for visualizing data or results (optional but helpful).
os or pathlib for interacting with the file system (finding image files).
PIL (Pillow) or cv2 (OpenCV) for loading and manipulating images. TensorFlow’s image_dataset_from_directory handles much of this conveniently.
Data Strategy:
Collection: Realistically, they’ll need to collect images or use an existing dataset. Since collecting and labeling is a significant task, suggest using a readily available dataset first (like the popular “Rock Paper Scissors Dataset” on Kaggle or TensorFlow Datasets). If they must collect, outline the steps (capture images, organize into folders per class). Self-correction: Explicitly state that the provided script assumes data is already organized.
Organization: The standard way is to have a main data directory with subdirectories for each class: data/rock/, data/paper/, data/scissors/. This structure works well with tools like image_dataset_from_directory.
Splitting: Need training and validation sets to monitor overfitting. image_dataset_from_directory can handle this split.
Preprocessing Steps:
Resizing: All images must be the same size for the CNN input layer. Choose a reasonable size (e.g., 150×150 or 224×224).
Color Channels: Decide if using color (3 channels) or grayscale (1 channel). Color is standard unless there’s a reason not to.
Normalization: Pixel values (0-255) should be scaled, typically to [0, 1] or [-1, 1]. This helps training stability. TensorFlow’s Rescaling layer is perfect for this.
Data Augmentation (Optional but Recommended): To improve robustness and prevent overfitting, randomly flip, rotate, zoom, etc., the training images. TensorFlow’s Sequential model with layers like RandomFlip, RandomRotation is ideal. Decision: Include basic augmentation.
Model Architecture (CNN using Keras/TensorFlow):
Input Layer: Define the input shape (image height, image width, channels).
Convolutional Layers (Conv2D): Extract features. Use multiple layers, increasing the number of filters typically. Use ReLU activation.
Pooling Layers (MaxPooling2D): Downsample feature maps, reduce computation, provide some translation invariance.
Flatten Layer: Convert the 2D feature maps into a 1D vector for the dense layers.
Dense Layers: Perform classification based on the extracted features. One or two hidden dense layers with ReLU activation.
Output Layer: Dense layer with the number of units equal to the number of classes (3). Use softmax activation for multi-class classification to get probabilities.
Data Augmentation Layer(s): Place these before the Rescaling layer or right after the input layer.
Training Process:
Compile the Model: Specify the optimizer (adam is a good default), the loss function (sparse_categorical_crossentropy because labels are integers 0, 1, 2, not one-hot encoded), and metrics (accuracy).
Load Data: Use tf.keras.utils.image_dataset_from_directory. Specify validation_split and subset. Configure batch size. Use .cache() and .prefetch() for performance.
Fit the Model: Call model.fit(), passing training data, validation data, and the number of epochs. Store the history.
Evaluation and Prediction:
Evaluate: Call model.evaluate() on the validation/test dataset to get final loss and accuracy.
Predict: Show how to load a new, unseen image, preprocess it correctly (resize, rescale, add batch dimension), and use model.predict() to get probabilities. Map the highest probability index back to the class name.
Code Structure and Explanation:
Imports: List all necessary libraries.
Configuration: Define constants (image size, batch size, epochs, data directory).
Data Loading/Preprocessing: Show the image_dataset_from_directory usage and explain the parameters.
Data Augmentation: Define the augmentation layers.
Model Definition: Build the CNN layer by layer using tf.keras.Sequential. Include comments explaining each layer’s purpose.
Model Compilation: Show model.compile().
Training: Show model.fit().
Evaluation: Show model.evaluate().
Visualization (Optional): Plot accuracy/loss curves using matplotlib.
Prediction Example: Provide a clear snippet for predicting a single image.
Important Notes: Add crucial caveats about needing data, potential for overfitting, hyperparameter tuning, etc. Address the “language model” misconception gently.
Refinement:
Read through the code and explanation. Is it clear? Logical?
Are the variable names descriptive?
Are the comments helpful?
Is the output format clear (e.g., prediction probabilities)?
Add instructions on how to run the script (install libraries, prepare data).
Ensure the prediction part correctly handles the batch dimension (tf.expand_dims).
Make sure the class names (rock, paper, scissors) are consistently used.
Okay, let’s outline how you can create an AI model in Python to classify images of hand shapes (Rock, Paper, Scissors).
You mentioned “language model,” but this task is actually an Image Classification problem. Language models deal with text and sequences, while we need a model that understands visual patterns in images. Convolutional Neural Networks (CNNs) are the standard and most effective approach for this.
Here’s a breakdown of the steps and a Python script using TensorFlow/Keras, a popular deep learning library:
1. Project Setup and Dependencies:
- Install Libraries: You’ll need TensorFlow, NumPy, and optionally Matplotlib (for plotting).
bash pip install tensorflow numpy matplotlib Pillow
(Pillow is used by Keras utilities for image loading)
2. Data Preparation:
- Gather Images: This is the most crucial step. You need a dataset of images clearly showing Rock, Paper, and Scissors hand shapes.
- Option A (Recommended for starting): Use an existing dataset. A popular one is the “Rock Paper Scissors Dataset” available on Kaggle or directly through TensorFlow Datasets (
tensorflow-datasets
). - Option B: Create your own dataset. Take many pictures of each hand shape under varying lighting conditions, backgrounds, and angles. This is more work but gives you custom data.
- Option A (Recommended for starting): Use an existing dataset. A popular one is the “Rock Paper Scissors Dataset” available on Kaggle or directly through TensorFlow Datasets (
- Organize Data: Structure your image files into directories like this:
dataset/ ├── rock/ │ ├── rock01.png │ ├── rock02.png │ └── ... ├── paper/ │ ├── paper01.png │ ├── paper02.png │ └── ... └── scissors/ ├── scissors01.png ├── scissors02.png └── ...
- Split Data: Divide your data into Training and Validation sets. The training set is used to teach the model, and the validation set is used to check its performance on unseen data during training to prevent overfitting. A common split is 80% training, 20% validation.
3. Python Script (using TensorFlow/Keras):
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
import numpy as np
import matplotlib.pyplot as plt # Optional for plotting
import os
import pathlib # More modern way to handle paths
# --- Configuration ---
DATA_DIR = pathlib.Path('dataset') # CHANGE THIS to your dataset directory path
if not DATA_DIR.exists():
# Attempt to download and extract the Kaggle dataset if dir doesn't exist
# NOTE: This requires the 'kaggle' library and API credentials configured
# Or, manually download from https://www.kaggle.com/drgfreeman/rockpaperscissors
# and place it in the 'dataset' directory
print(f"Error: Dataset directory '{DATA_DIR}' not found.")
print("Please download the Rock Paper Scissors dataset and place it here,")
print("or ensure the path is correct.")
exit() # Exit if data is not found
IMAGE_HEIGHT = 150
IMAGE_WIDTH = 150
BATCH_SIZE = 32 # How many images to process at once
EPOCHS = 15 # How many times to go through the entire training dataset
VALIDATION_SPLIT = 0.2 # Use 20% of data for validation
# --- Load Data ---
print("Loading training data...")
train_ds = tf.keras.utils.image_dataset_from_directory(
DATA_DIR,
validation_split=VALIDATION_SPLIT,
subset="training",
seed=123, # Use a seed for reproducibility
image_size=(IMAGE_HEIGHT, IMAGE_WIDTH),
batch_size=BATCH_SIZE)
print("Loading validation data...")
val_ds = tf.keras.utils.image_dataset_from_directory(
DATA_DIR,
validation_split=VALIDATION_SPLIT,
subset="validation",
seed=123,
image_size=(IMAGE_HEIGHT, IMAGE_WIDTH),
batch_size=BATCH_SIZE)
# Get class names (automatically inferred from directory names)
class_names = train_ds.class_names
num_classes = len(class_names)
print(f"Found classes: {class_names}")
# --- Configure Dataset for Performance ---
# Caching keeps data in memory after first load, prefetching overlaps data loading/processing
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
# --- Optional: Data Augmentation ---
# Helps prevent overfitting by artificially expanding the dataset
data_augmentation = Sequential(
[
layers.RandomFlip("horizontal", input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3)),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
# Add more augmentations if needed (e.g., RandomContrast)
],
name="data_augmentation"
)
# --- Build the CNN Model ---
# Normalization layer to scale pixel values from [0, 255] to [0, 1]
normalization_layer = layers.Rescaling(1./255)
model = Sequential([
data_augmentation, # Apply augmentation first
layers.Rescaling(1./255, input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3)), # Rescale pixel values
# Convolutional Block 1
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Convolutional Block 2
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Convolutional Block 3
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Optional: Add Dropout to reduce overfitting
layers.Dropout(0.2),
# Flatten the results to feed into dense layers
layers.Flatten(),
# Dense Layer
layers.Dense(128, activation='relu'),
# Output Layer (3 classes, softmax for probability distribution)
layers.Dense(num_classes, activation='softmax') # Use softmax for multi-class
])
# --- Compile the Model ---
model.compile(optimizer='adam', # Adam is a good general-purpose optimizer
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), # Use this loss for integer labels
metrics=['accuracy']) # Track accuracy during training
# --- Print Model Summary ---
model.build(input_shape=(None, IMAGE_HEIGHT, IMAGE_WIDTH, 3)) # Build model to show summary
model.summary()
# --- Train the Model ---
print("\nStarting Training...")
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=EPOCHS
)
print("Training Finished.")
# --- Evaluate the Model ---
print("\nEvaluating Model...")
loss, accuracy = model.evaluate(val_ds, verbose=0) # Use validation set for evaluation here
print(f'\nValidation Loss: {loss:.4f}')
print(f'Validation Accuracy: {accuracy*100:.2f}%')
# --- Optional: Plot Training History ---
acc = history.history['accuracy']
val_acc = history.history['validation_accuracy']
loss = history.history['loss']
val_loss = history.history['validation_loss']
epochs_range = range(EPOCHS)
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
# --- Save the Model (Optional) ---
model_save_path = "rock_paper_scissors_model.keras"
print(f"\nSaving model to {model_save_path}")
model.save(model_save_path)
print("Model Saved.")
# --- Example: Predict a Single New Image ---
# You would replace 'path/to/your/test_image.png' with an actual image path
try:
# Find an image from the validation set to test prediction
# Find first batch in validation set
image_batch, label_batch = next(iter(val_ds))
first_image = image_batch[0]
first_label_index = label_batch[0].numpy() # Get the integer label index
print(f"\nPredicting one image from validation set (Actual Label: {class_names[first_label_index]})")
# Model expects a batch, so add a dimension
img_array = tf.expand_dims(first_image, 0) # Create a batch
predictions = model.predict(img_array)
score = predictions[0] # Probabilities for the first (only) image in the batch
predicted_index = np.argmax(score)
predicted_class = class_names[predicted_index]
confidence = 100 * np.max(score)
print(f"Predicted Class: {predicted_class}")
print(f"Confidence: {confidence:.2f}%")
print(f"Raw Scores: {score}")
except StopIteration:
print("\nCould not get an image from validation set to run prediction example.")
except Exception as e:
print(f"\nError during prediction example: {e}")
Explanation:
- Import Libraries: Imports TensorFlow, Keras layers, NumPy, etc.
- Configuration: Sets up key parameters like the data directory path, image dimensions, batch size, epochs, and validation split percentage. Remember to change
DATA_DIR
! - Load Data: Uses
tf.keras.utils.image_dataset_from_directory
. This is a very convenient function that reads images from the organized directories, automatically infers class names, resizes images, creates batches, and handles the training/validation split. - Performance Configuration:
cache()
andprefetch()
help speed up data loading during training.shuffle()
randomizes the training data order. - Data Augmentation: Creates a small sequential model that applies random flips, rotations, and zooms only during training. This makes the model more robust to variations in the input images.
- Build Model:
- Creates a
Sequential
Keras model. - Starts with the
data_augmentation
layers. - Includes a
Rescaling
layer to normalize pixel values to the [0, 1] range. - Adds several
Conv2D
(convolutional) andMaxPooling2D
layers. Convolutions detect features (edges, textures), and max-pooling reduces the image size while keeping important information. Dropout
randomly sets a fraction of input units to 0 during training, helping prevent overfitting.Flatten
converts the 2D output of the convolutional blocks into a 1D vector.Dense
layers are standard fully connected neural network layers.- The final
Dense
layer hasnum_classes
units (3 for Rock, Paper, Scissors) and asoftmax
activation function. Softmax outputs a probability distribution across the classes (the probabilities will sum to 1).
- Creates a
- Compile Model: Configures the model for training.
optimizer='adam'
: An efficient gradient descent optimization algorithm.loss='sparse_categorical_crossentropy'
: The appropriate loss function when your labels are integers (0, 1, 2) and you have multiple classes with softmax output.metrics=['accuracy']
: Tells Keras to calculate and report classification accuracy during training and evaluation.
- Model Summary: Prints a table showing the layers, output shapes, and number of parameters in your model.
- Train Model:
model.fit()
starts the training process using the training data (train_ds
) and evaluates performance on the validation data (val_ds
) after each epoch. Thehistory
object stores the accuracy and loss values for each epoch. - Evaluate Model:
model.evaluate()
calculates the final loss and accuracy on the validation dataset after training is complete. - Plot History (Optional): Uses Matplotlib to visualize how training/validation accuracy and loss changed over the epochs. This helps diagnose overfitting (where training accuracy keeps increasing, but validation accuracy plateaus or decreases).
- Save Model (Optional): Saves the trained model structure and weights to a file so you can load it later without retraining.
- Prediction Example: Shows how to load a single image, preprocess it (add batch dimension, potentially resize/rescale if not done automatically by loading method), get predictions using
model.predict()
, and interpret the results (find the class with the highest probability usingnp.argmax
).
To Run This:
- Save the code as a Python file (e.g.,
train_rps.py
). - Make sure you have the required libraries installed (
pip install ...
). - Download or create your dataset and place it in a directory named
dataset
(or update theDATA_DIR
variable in the script). - Run the script from your terminal:
python train_rps.py
.
This script provides a solid foundation for training your Rock-Paper-Scissors image classifier. Remember that getting good data and potentially tuning the model architecture (number of layers, filters, dropout rate) and training parameters (epochs, learning rate – though Adam handles this somewhat automatically) might be necessary for optimal performance.