2ndIBM · ANRF · IIIT Hyderabad

Flood Inundation Mapping from SAR

ANRF AISE Hack 2026·APR 2026·Team MEGALODON

Separating floodwater from permanent water on ~79 labeled SAR patches, with optical imagery cloud-occluded and the classes severely imbalanced.

2nd

Result

₹1.25 lakh

Prize

2000+

Field

2026

Year

Outcome

2nd of 2000+ participants nationwide and a ₹1.25 lakh prize — a heterogeneous 5-model SAR ensemble that beat any single ConvNeXt under brutal small-data constraints.

The problem

The task was flood-pixel binary segmentation from satellite imagery over West Bengal — mark every pixel as flood or not-flood. Three things made it hard:

Optical was useless when it mattered. Monsoon flooding comes with cloud cover, so the optical bands were occluded exactly when you needed them. That forced a SAR-primary approach (Synthetic Aperture Radar sees through clouds).
Almost no labels. Roughly 79 labeled 512×512 patches total. This is small-data territory where most deep-learning instincts are wrong.
Flood vs. permanent water. SAR backscatter for a flooded field and a permanent lake looks similar. Disambiguating the two — without a model just memorizing rivers — was the real battle.

What we shipped

A heterogeneous 5-model ensemble, combined by a 3-of-5 majority vote over each model's binary mask. The diversity was deliberate — uncorrelated errors are what make an ensemble actually help:

Siamese U-Net with an EfficientNet-B4 backbone (FP32, PyTorch Lightning) as the strong baseline.
Knowledge distillation + transductive pseudo-labels on the prediction tiles — a B4 teacher supervising a student over the unlabeled test patches.
A Siamese EfficientNet-B7 variant for backbone diversity.
A ConvNeXt-Large 3-class head, thresholded down to a flood channel, to bring a different inductive bias.
A 21-channel baseline U-Net with dynamic Otsu thresholding.

Dual-stream architectures kept the SAR signal separate from topographic priors (elevation-related bands and a surface-water mask), so the model could reason about "is this water" and "could this plausibly flood" as distinct questions.

Results

Artifact	Approx. public LB
Best single model (KD, transductive)	~0.207
4-model vote	~0.221
Final 5-model, 3-of-5 vote	~0.227

The ensemble beat every single model — including the ConvNeXt — which was the whole thesis under small-data: robustness from disagreement rather than one heroic network. It placed 2nd among 2000+ participants nationally and won a ₹1.25 lakh prize.

What I learned the hard way

FP16 mixed precision hurt. In small-data ad-hoc loops, half precision destabilized training; PyTorch Lightning + FP32 lined up with much stronger validation.
Class imbalance is a loss-function problem. Weighted BCE, edge-aware losses, and threshold sweeps on validation mattered more than any single architecture choice.
Reproducibility is a deliverable. The hosts got a frozen inference path — five model submissions merged by one notebook — not a pile of experiment artifacts.

What I'd do differently

Invest earlier in a single clean validation split. A lot of time went into reconciling metrics across ad-hoc loops; a disciplined split from day one would have made every later decision faster and more trustworthy.

Gallery

Scroll sideways · click any photo to enlarge