Loomis Construction ControlNet (Step-by-Step Sketcher)

Loomis-ControlNet is a custom fine-tuned ControlNet adapter for Stable Diffusion v1.5. It is designed to act as a "virtual art tutor," taking any portrait photograph and deconstructing it into 7 distinct drawing stages based on the Andrew Loomis method—from the initial cranial construction lines to the final graphite rendering.

Model Weights: HudTariq/loomis-model-output-v4-high-quality-7steps
Base Model: Stable Diffusion v1.5 (runwayml/stable-diffusion-v1-5)
Training Resolution: 512x512
License: CC by NC

🎨 Capabilities & "The 7 Steps"

This model was trained to understand the logical progression of a graphite portrait sketch. By changing the prompt (e.g., step 1, step 4), you can generate:

Step 1: Basic Construction (Cranial circle, jawline axis, raw geometry).
Step 2: Contours & Wireframe (Profile outline, chin blocking).
Step 3: Feature Placement (Eye sockets, nose wedge, ear placement).
Step 4: Planes & Shading (Cheekbone shadows, temple values, planar masses).
Step 5: Form Definition (Mid-tones, hair blocking, ear detailing).
Step 6: Texture & Contrast (Hair strands, skin texture, stubble).
Step 7: Final Render (Hyper-realistic graphite polish, deep shadows).

💻 How to Use (Python Code)

For the best results—specifically to preserve the identity of your input subject—it is highly recommended to use a Multi-ControlNet pipeline.

Using this model alone creates excellent Loomis-style sketches but may result in a generic face. Pairing it with a Canny or Depth ControlNet "locks" the facial features of your subject.

Recommended Pipeline: The "Identity Lock" Method (on kaggle/jupyter etc)

This script uses the Canny ControlNet to force the AI to trace the exact features of your photo, while the Loomis ControlNet handles the artistic style and construction steps.

import torch
import cv2
import numpy as np
import ipywidgets as widgets
import io
import gc
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from transformers import BlipProcessor, BlipForConditionalGeneration
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import os

# ==========================================
# 1. LOAD MODELS (Singleton Check)
# ==========================================
def load_models():
    global pipe, blip_processor, blip_model
    
    # Only load pipeline if not already loaded
    if 'pipe' not in globals() or not isinstance(pipe.controlnet, torch.nn.ModuleList):
        print("⏳ Loading ControlNets... (Wait ~1 min)")
        cn_loomis = ControlNetModel.from_pretrained(
            "HudTariq/loomis-model-output-v4-high-quality-7steps", 
            subfolder="checkpoint-3000/controlnet", 
            torch_dtype=torch.float16
        )
        cn_canny = ControlNetModel.from_pretrained(
            "lllyasviel/sd-controlnet-canny", 
            torch_dtype=torch.float16
        )
        
        pipe = StableDiffusionControlNetPipeline.from_pretrained(
            "runwayml/stable-diffusion-v1-5", 
            controlnet=[cn_loomis, cn_canny], 
            torch_dtype=torch.float16, 
            safety_checker=None
        ).to("cuda")
        pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
        pipe.enable_model_cpu_offload()
        print("✅ Pipeline Loaded!")

    # Only load captioner if not already loaded
    if 'blip_model' not in globals():
        print("⏳ Loading Captioner...")
        blip_processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
        blip_model = BlipForConditionalGeneration.from_pretrained(
            "Salesforce/blip-image-captioning-base", 
            torch_dtype=torch.float16
        ).to("cuda")
        print("✅ Captioner Loaded!")

load_models()

# ==========================================
# 2. ROBUST FACE CROPPER (OpenCV Version)
# ==========================================
def smart_crop(image):
    """Detects face using OpenCV and crops to portrait."""
    # Download the Face Cascade XML if missing (Standard OpenCV model)
    cascade_path = "haarcascade_frontalface_default.xml"
    if not os.path.exists(cascade_path):
        os.system(f"wget -q https://raw.githubusercontent.com/opencv/opencv/master/data/haarcascades/{cascade_path}")
    
    # Convert PIL -> OpenCV
    img_np = np.array(image)
    gray = cv2.cvtColor(img_np, cv2.COLOR_RGB2GRAY)
    
    # Detect Faces
    face_cascade = cv2.CascadeClassifier(cascade_path)
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30))
    
    if len(faces) == 0:
        print("⚠️ No face detected (or face too small). Using Center Crop.")
        width, height = image.size
        short_dim = min(width, height)
        # Fallback: simple square crop
        return image.crop(((width-short_dim)//2, (height-short_dim)//2, (width+short_dim)//2, (height+short_dim)//2)).resize((512,512))

    # Pick the largest face found
    x, y, w, h = max(faces, key=lambda b: b[2] * b[3])
    
    # === THE MAGIC PADDING MATH ===
    # Loomis portraits need the head to take up about 50-60% of vertical space
    # We expand the crop box by 2.2x the face size
    crop_size = int(max(w, h) * 2.2)
    
    center_x = x + w // 2
    center_y = y + h // 2
    
    img_h, img_w = img_np.shape[:2]
    
    x1 = max(0, center_x - crop_size // 2)
    y1 = max(0, center_y - crop_size // 2)
    x2 = min(img_w, center_x + crop_size // 2)
    y2 = min(img_h, center_y + crop_size // 2)
    
    # Crop and Resize
    cropped = image.crop((x1, y1, x2, y2))
    return cropped.resize((512, 512), Image.LANCZOS)

# ==========================================
# 3. HELPERS
# ==========================================
def get_canny_image(image, low_threshold=100, high_threshold=200):
    image = np.array(image)
    image = cv2.Canny(image, low_threshold, high_threshold)
    image = image[:, :, None]
    image = np.concatenate([image, image, image], axis=2)
    return Image.fromarray(image)

def auto_detect_subject(image):
    inputs = blip_processor(image, "a close up photo of a", return_tensors="pt").to("cuda", torch.float16)
    out = blip_model.generate(**inputs, max_new_tokens=20)
    caption = blip_processor.decode(out[0], skip_special_tokens=True)
    return caption.replace("a close up photo of a", "").strip()

# ==========================================
# 4. EXECUTION LOGIC
# ==========================================
STEP_PROMPTS = {
    1: "step 1, basic cranial circle, jawline stroke, ear axis line, raw construction, minimal lines",
    2: "step 2, profile contour, nose silhouette, lip profile, chin outline, wireframe",
    3: "step 3, ear shape definition, eye socket placement, nostril marking, feature blocking",
    4: "step 4, side plane shading, cheekbone shadow, temple value, planar masses",
    5: "step 5, hair mass blocking, mid-tone shading, ear detailing, form definition",
    6: "step 6, hair strand texture, stubble detail, skin texture, eye shading, contrast",
    7: "step 7, final graphite render, deep shadows, hyper-realistic detailed portrait, masterpiece"
}

STRENGTH_SCHEDULE = {1: 0.30, 2: 0.40, 3: 0.55, 4: 0.65, 5: 0.80, 6: 0.90, 7: 1.00}

uploader = widgets.FileUpload(accept='image/*', multiple=False, description='Upload')
subject_display = widgets.Text(placeholder='Detected subject...', description='Subject:', disabled=True)
btn_gen = widgets.Button(description='Smart Crop & Generate', button_style='primary', icon='crop', layout=widgets.Layout(width='100%'))
out = widgets.Output()

def run_generation(b):
    with out:
        clear_output()
        if not uploader.value:
            print("⚠️ Upload a photo!")
            return
            
        try:
            val = uploader.value[0] if isinstance(uploader.value, tuple) else list(uploader.value.values())[0]
            content = val['content']
        except:
            print("⚠️ Upload Error. Try again.")
            return

        print("🔍 Detecting Face (OpenCV) & Cropping...")
        raw_img = Image.open(io.BytesIO(content)).convert("RGB")
        input_image = smart_crop(raw_img)
        
        display(input_image.resize((150,150))) # Show crop preview
        
        print("👁️ Captioning...")
        subject = auto_detect_subject(input_image)
        subject_display.value = subject
        print(f"✅ Subject: {subject}")

        canny_image = get_canny_image(input_image)
        images = [input_image]
        titles = ["Original Crop"]
        
        print("🎨 Sketching...")
        for i in range(1, 8):
            prompt = f"a Loomis construction sketch of a {subject}, {STEP_PROMPTS[i]}"
            strength = STRENGTH_SCHEDULE[i]
            generator = torch.Generator(device="cuda").manual_seed(42)
            
            with torch.no_grad():
                res = pipe(
                    prompt,
                    image=[input_image, canny_image],
                    num_inference_steps=25,
                    guidance_scale=7.5,
                    controlnet_conditioning_scale=[1.0, strength],
                    generator=generator
                ).images[0]
            images.append(res)
            titles.append(f"Step {i}")
            torch.cuda.empty_cache()

        fig, axes = plt.subplots(1, 8, figsize=(24, 4))
        for ax, img, title in zip(axes, images, titles):
            ax.imshow(img)
            ax.set_title(title, fontsize=10)
            ax.axis('off')
        plt.tight_layout()
        plt.show()
        gc.collect()

btn_gen.on_click(run_generation)
display(widgets.VBox([widgets.Label("Loomis Portrait Generator (Auto-Crop)"), uploader, subject_display, btn_gen, out])

⚙️ Usage Tips

1. Dynamic Strength Scheduling

To get the most authentic "tutorial" look, adjust the Canny ControlNet strength based on the step you are generating:

Steps 1–2 (Construction): Use Low Canny Strength (0.3 - 0.5). This allows the model to draw abstract circles and guidelines without being forced to draw detailed eyelashes or wrinkles immediately.
Steps 3–5 (Form): Use Medium Strength (0.6 - 0.8).
Steps 6–7 (Final Render): Use High Strength (1.0). This locks in the likeness perfectly for the finished portrait.

2. Prompt Engineering

The model responds to specific keywords associated with the training data.

Subject: Always specify gender/age (e.g., "young woman", "old man", "bearded man") to help the model align the anatomy.
Trigger Words:
- basic cranial circle (Step 1)
- wireframe, chin outline (Step 2)
- planar masses, cheekbone shadow (Step 4)
- graphite render, masterpiece (Step 7)

📊 Training Details

Dataset: 1,141 curated pairs of images.
- Input: Synthesized construction steps (from geometric primitives to partial shading).
- Target: High-quality graphite portraits.
Hardware: Trained on NVIDIA Tesla T4 (Kaggle).
Steps: 3,000 training steps.
Batch Size: 8 (Gradient Accumulation: 4, Batch Size: 2).
Learning Rate: 1e-5 (Constant).
Optimizer: 8-bit AdamW.

Limitations

Cropping: The model was trained on tightly cropped headshots. Images with large amounts of background or full-body shots may confuse the cranial construction logic. Face-cropping is recommended before inference.
Style: The output is strictly monochromatic graphite/pencil style.
Lighting: Extreme lighting conditions (e.g., silhouettes) may cause the "planar shading" steps to fail.

⚖️ License

This model is released under the CC BY-NC 4.0 license.

✅ You are free to: Share, copy, and adapt the material.
❌ NonCommercial: You may not use the material for commercial purposes (e.g., selling the model, using it in a paid app, or selling services based on it).
⚠️ Base Model Restrictions: As this is a derivative of Stable Diffusion v1.5, you must also adhere to the use-restrictions of the original OpenRAIL-M license (e.g., no generating illegal or harmful content).

Downloads last month: -

Model tree for HudTariq/loomis-model-output-v4-high-quality-7steps

Base model

runwayml/stable-diffusion-v1-5

Adapter

(2702)

this model

HudTariq
/

loomis-model-output-v4-high-quality-7steps