py4u guide

Building Data Science Applications with Django in Python

Data science has revolutionized industries by enabling data-driven decision-making, but translating models into user-friendly applications remains a critical challenge. Enter Django—a high-level Python web framework that simplifies building robust, scalable web applications. By combining Django’s strengths (rapid development, built-in admin, security) with Python’s data science ecosystem (Pandas, Scikit-learn, TensorFlow), you can create end-to-end data science applications that deliver value to users. This blog will guide you through the process of building a data science application with Django. We’ll cover project setup, integrating machine learning models, adding visualizations, deployment, best practices, and a hands-on case study. Whether you’re a data scientist looking to deploy models or a web developer exploring data science, this guide will equip you with the tools to build production-ready applications.

Table of Contents

  1. Prerequisites
  2. Setting Up Your Environment
  3. Creating a Django Project: Foundations
  4. Integrating Data Science Models into Django
  5. Building User Interfaces for Data Input/Output
  6. Adding Data Visualizations
  7. Advanced Features: Asynchronous Tasks and APIs
  8. Deployment: Taking Your App Live
  9. Best Practices
  10. Case Study: House Price Prediction App
  11. References

Prerequisites

Before diving in, ensure you have:

  • Basic knowledge of Python (syntax, functions, modules).
  • Familiarity with Django basics (models, views, templates, URLs).
  • Understanding of data science workflows (model training, evaluation).
  • Python 3.8+ installed.
  • A code editor (VS Code, PyCharm) and terminal.

Setting Up Your Environment

Step 1: Install Python and Virtual Environment

First, set up a virtual environment to isolate dependencies:

# Create a project folder  
mkdir django-data-science-app && cd django-data-science-app  

# Create and activate a virtual environment  
python -m venv venv  
source venv/bin/activate  # Linux/macOS  
venv\Scripts\activate     # Windows  

# Verify activation (terminal prompt shows `(venv)`)  

Step 2: Install Dependencies

Install Django and key data science libraries:

pip install django pandas scikit-learn joblib matplotlib plotly  
  • django: Web framework.
  • pandas: Data manipulation.
  • scikit-learn: Machine learning models.
  • joblib: Serialize/deserialize models.
  • matplotlib/plotly: Data visualization.

Creating a Django Project: Foundations

Let’s start by creating a Django project and app. We’ll use a modular structure to separate web logic from data science code.

Step 1: Start a Django Project

django-admin startproject core .  # Creates a project named `core` in the current directory  

Step 2: Create a Data Science App

Django uses “apps” to organize code. We’ll create an app named predictor for our data science logic:

python manage.py startapp predictor  

Step 3: Configure Settings

Add predictor to INSTALLED_APPS in core/settings.py:

# core/settings.py  
INSTALLED_APPS = [  
    "django.contrib.admin",  
    "django.contrib.auth",  
    "django.contrib.contenttypes",  
    "django.contrib.sessions",  
    "django.contrib.messages",  
    "django.contrib.staticfiles",  
    "predictor",  # Add your app here  
]  

Step 4: Define URLs

Map URLs to views. First, update core/urls.py to include the predictor app’s URLs:

# core/urls.py  
from django.contrib import admin  
from django.urls import path, include  

urlpatterns = [  
    path("admin/", admin.site.urls),  
    path("", include("predictor.urls")),  # Route root URLs to `predictor`  
]  

Create a urls.py file in the predictor app:

# predictor/urls.py  
from django.urls import path  
from . import views  

urlpatterns = [  
    path("", views.home, name="home"),  # Home page  
    path("predict/", views.predict, name="predict"),  # Prediction endpoint  
]  

Integrating Data Science Models into Django

The core of your app will be a trained machine learning model. Here’s how to integrate it into Django.

Step 1: Train and Save a Model

First, train a simple model (e.g., a linear regression model for house price prediction) and save it using joblib.

Create a models directory in predictor to store model-related code:

mkdir -p predictor/models  

Add a script train_model.py to train and save the model:

# predictor/models/train_model.py  
import pandas as pd  
from sklearn.datasets import fetch_california_housing  
from sklearn.linear_model import LinearRegression  
from sklearn.model_selection import train_test_split  
import joblib  

# Load dataset (California Housing Prices)  
housing = fetch_california_housing()  
X = pd.DataFrame(housing.data, columns=housing.feature_names)  
y = housing.target  # Median house value (in $100k)  

# Train-test split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  

# Train model  
model = LinearRegression()  
model.fit(X_train, y_train)  

# Save model to disk  
joblib.dump(model, "predictor/models/house_price_model.joblib")  
print("Model saved!")  

Run the script to train and save the model:

python predictor/models/train_model.py  

Step 2: Load the Model in Django

To avoid reloading the model on every request (which is inefficient), load it once when the Django app starts. Create a predictor_service.py to handle model loading and predictions:

# predictor/models/predictor_service.py  
import joblib  
import numpy as np  
from django.conf import settings  
import os  

# Path to the saved model  
MODEL_PATH = os.path.join(settings.BASE_DIR, "predictor/models/house_price_model.joblib")  

# Global variable to cache the model  
model = None  

def load_model():  
    """Load the model once and reuse it."""  
    global model  
    if model is None:  
        model = joblib.load(MODEL_PATH)  
    return model  

def predict_price(features):  
    """Predict house price using the loaded model."""  
    model = load_model()  
    features_array = np.array(features).reshape(1, -1)  # Reshape for single sample  
    prediction = model.predict(features_array)  
    return round(prediction[0] * 100000, 2)  # Convert from $100k to $  

Building User Interfaces for Data Input/Output

Users need a way to input data (e.g., house features) and view predictions. We’ll use Django forms and templates for this.

Step 1: Create a Form for Input

Django forms simplify data validation. Create forms.py in the predictor app:

# predictor/forms.py  
from django import forms  

class HousePriceForm(forms.Form):  
    # Fields match the California Housing dataset features  
    MedInc = forms.FloatField(  
        label="Median Income (in $10k)",  
        min_value=0,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    HouseAge = forms.FloatField(  
        label="Median House Age (years)",  
        min_value=0,  
        max_value=100,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    AveRooms = forms.FloatField(  
        label="Average Rooms per Household",  
        min_value=0,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    AveBedrms = forms.FloatField(  
        label="Average Bedrooms per Household",  
        min_value=0,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    Population = forms.FloatField(  
        label="Block Group Population",  
        min_value=0,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    AveOccup = forms.FloatField(  
        label="Average Occupants per Household",  
        min_value=0,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    Latitude = forms.FloatField(  
        label="Latitude",  
        min_value=32,  
        max_value=42,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  
    Longitude = forms.FloatField(  
        label="Longitude",  
        min_value=-125,  
        max_value=-114,  
        widget=forms.NumberInput(attrs={"class": "form-control"})  
    )  

Step 2: Create a View to Handle Predictions

Views process requests, interact with the model, and render templates. Update views.py:

# predictor/views.py  
from django.shortcuts import render  
from .forms import HousePriceForm  
from .models.predictor_service import predict_price  

def home(request):  
    return render(request, "predictor/home.html")  

def predict(request):  
    if request.method == "POST":  
        form = HousePriceForm(request.POST)  
        if form.is_valid():  
            # Extract cleaned data from the form  
            data = form.cleaned_data  
            features = [  
                data["MedInc"],  
                data["HouseAge"],  
                data["AveRooms"],  
                data["AveBedrms"],  
                data["Population"],  
                data["AveOccup"],  
                data["Latitude"],  
                data["Longitude"],  
            ]  
            price = predict_price(features)  # Get prediction  
            return render(request, "predictor/predict.html", {"form": form, "price": price})  
    else:  
        form = HousePriceForm()  # Empty form for GET request  
    return render(request, "predictor/predict.html", {"form": form})  

Step 3: Create Templates for the UI

Templates define the HTML structure. Create a templates/predictor directory and add two files:

home.html (Landing Page):

<!-- predictor/templates/predictor/home.html -->  
<!DOCTYPE html>  
<html>  
<head>  
    <title>House Price Predictor</title>  
    <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">  
</head>  
<body>  
    <div class="container mt-5">  
        <h1>Welcome to House Price Predictor</h1>  
        <p class="lead">Enter house features below to get a price estimate.</p>  
        <a href="{% url 'predict' %}" class="btn btn-primary">Go to Predictor</a>  
    </div>  
</body>  
</html>  

predict.html (Prediction Form/Results):

<!-- predictor/templates/predictor/predict.html -->  
<!DOCTYPE html>  
<html>  
<head>  
    <title>House Price Predictor</title>  
    <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">  
</head>  
<body>  
    <div class="container mt-5">  
        <h2>House Price Prediction</h2>  
        <form method="post" class="mt-3">  
            {% csrf_token %}  <!-- Security token -->  
            {{ form.as_p }}  <!-- Render form fields -->  
            <button type="submit" class="btn btn-success">Predict Price</button>  
        </form>  

        {% if price %}  <!-- Display prediction if available -->  
        <div class="alert alert-info mt-4">  
            <h3>Predicted House Price: ${{ price }}</h3>  
        </div>  
        {% endif %}  
    </div>  
</body>  
</html>  

Adding Data Visualizations

Visualizations make results more interpretable. Let’s add a bar chart comparing the predicted price to regional averages using Matplotlib.

Step 1: Generate a Plot in the View

Update the predict view to generate a plot when a prediction is made:

# predictor/views.py  
import matplotlib.pyplot as plt  
import os  
from django.conf import settings  
from django.templatetags.static import static  

def predict(request):  
    if request.method == "POST":  
        form = HousePriceForm(request.POST)  
        if form.is_valid():  
            # ... (previous code to get features and price)  

            # Generate visualization  
            plt.switch_backend("Agg")  # Required for non-interactive environments  
            fig, ax = plt.subplots(figsize=(8, 5))  
            labels = ["Predicted Price", "Regional Average ($500k)"]  
            values = [price, 500000]  
            ax.bar(labels, values, color=["blue", "orange"])  
            ax.set_ylabel("Price ($)")  
            ax.set_title("Predicted vs. Regional Average Price")  

            # Save plot to static files  
            static_dir = os.path.join(settings.BASE_DIR, "predictor/static/predictor/plots/")  
            os.makedirs(static_dir, exist_ok=True)  
            plot_path = os.path.join(static_dir, "price_comparison.png")  
            plt.savefig(plot_path)  
            plt.close()  

            # Pass plot URL to template  
            plot_url = static("predictor/plots/price_comparison.png")  
            return render(request, "predictor/predict.html", {  
                "form": form,  
                "price": price,  
                "plot_url": plot_url  
            })  
    # ... (rest of the view)  

Step 2: Update Settings for Static Files

Django requires static file configuration to serve plots. Update core/settings.py:

# core/settings.py  
STATIC_URL = "/static/"  
STATIC_ROOT = os.path.join(settings.BASE_DIR, "staticfiles")  
STATICFILES_DIRS = [os.path.join(settings.BASE_DIR, "predictor/static")]  

Step 3: Display the Plot in the Template

Update predict.html to include the plot:

<!-- Add this below the prediction alert -->  
{% if plot_url %}  
<div class="mt-4">  
    <h4>Price Comparison</h4>  
    <img src="{{ plot_url }}" alt="Price Comparison Plot" class="img-fluid">  
</div>  
{% endif %}  

Advanced Features: Asynchronous Tasks and APIs

For compute-heavy models (e.g., deep learning), use Celery for asynchronous task processing to avoid blocking the web server. For programmatic access, add a REST API with Django REST Framework (DRF).

Example: Async Prediction with Celery

Install Celery and Redis (broker):

pip install celery redis  

Define a Celery task to handle predictions:

# predictor/tasks.py  
from celery import shared_task  
from .models.predictor_service import predict_price  

@shared_task  
def async_predict_price(features):  
    return predict_price(features)  

Update the view to use the async task (requires Celery setup, beyond this guide’s scope).

Example: REST API with DRF

Install DRF:

pip install djangorestframework  

Create a serializer and viewset:

# predictor/serializers.py  
from rest_framework import serializers  

class HousePriceSerializer(serializers.Serializer):  
    MedInc = serializers.FloatField()  
    HouseAge = serializers.FloatField()  
    # ... (other fields)  

# predictor/views.py  
from rest_framework.decorators import api_view  
from rest_framework.response import Response  

@api_view(["POST"])  
def api_predict(request):  
    serializer = HousePriceSerializer(data=request.data)  
    if serializer.is_valid():  
        features = list(serializer.validated_data.values())  
        price = predict_price(features)  
        return Response({"predicted_price": price})  
    return Response(serializer.errors, status=400)  

Deployment: Taking Your App Live

To share your app, deploy it to a cloud platform like Heroku or AWS. Here’s a quick Heroku deployment guide:

Step 1: Prepare Deployment Files

  • Procfile: Specifies the web server.
    web: gunicorn core.wsgi --log-file -  
  • requirements.txt: Lists dependencies.
    pip freeze > requirements.txt  
  • runtime.txt: Specifies Python version.
    python-3.9.7  

Step 2: Deploy to Heroku

heroku create my-ds-app  
git add . && git commit -m "Initial deploy"  
git push heroku main  
heroku run python manage.py migrate  

Best Practices

  1. Separation of Concerns: Keep data science code in a models/ or ml/ subdirectory, not in views.
  2. Model Versioning: Use tools like DVC or MLflow to track model versions.
  3. Testing: Write unit tests for models (e.g., test_predict_price()) and Django views.
  4. Security: Sanitize input, use HTTPS, and encrypt sensitive data.
  5. Performance: Cache predictions with Django’s cache_page or Redis for repeated inputs.

Case Study: House Price Prediction App

We’ve built a complete app that:

  1. Trains a linear regression model on housing data.
  2. Lets users input house features via a form.
  3. Returns a predicted price and a comparison plot.

To run it locally:

python manage.py runserver  

Visit http://localhost:8000 to use the app!

References


By combining Django and Python’s data science tools, you can build powerful, user-centric applications that bridge the gap between models and end-users. Start small, iterate, and scale—your data science app is just a few lines of code away!