Sentinel-2 Crop Classification & Yield Forecasting for 9 Crop Types

Participants

End Date

31.12.26

Dataset

dc0cqjgt

Resources2 CPU (8.59 GB) | 1 GPU (22.49 GB)

Compute

0 / 100.00 PF

Submits

0/5

On this page

Overview

About this use case: An agri-tech platform has assembled three years of Sentinel-2 imagery paired with ground-truth yield records across 50,000 hectares — a dataset agricultural AI companies would pay to train on, except that handing over the raw tiles means losing the commercial asset entirely. tracebloc lets crop classification models train inside the platform's infrastructure and generates recurring access-fee revenue without a single tile or yield label leaving. Explore the data, run your own models, and see how your approach compares.

Problem

Three years of Sentinel-2 satellite imagery, paired with ground-truth yield measurements across 50,000 hectares, is a rare commercial asset in agricultural AI. Most remote sensing agriculture models are trained on publicly available imagery without verified yield outcomes. The combination of satellite imagery and actual harvest data — at field level, over multiple growing seasons — is what agricultural AI companies and crop insurance platforms would pay to train on. The data owner cannot monetise it in the conventional way because sharing the imagery means losing control of it: competitors extract the imagery, train their own models, and the commercial advantage disappears. An agri-tech platform that took three years to assemble this dataset cannot simply hand it to a paying customer and trust that the data stays within the agreed use scope.

Solution

The agri-tech platform deploys a tracebloc workspace seeded with approximately 27,000 Sentinel-2 image tiles and their associated crop type labels. Agricultural AI companies submit their precision agriculture AI models to the workspace. Inside tracebloc's containerised training environment, each model trains on the satellite imagery — fine-tuning its weights to the spectral signatures, seasonal patterns, and crop type distribution of this specific dataset — without any satellite tile leaving the platform's infrastructure. tracebloc handles orchestration, scores each adapted model against the 5,000-tile holdout set, and publishes results to a live leaderboard ranked by classification accuracy across nine crop types. This is a federated learning application of data monetisation: the platform generates revenue from model training access without surrendering ownership of a single pixel.

Outcome

In this example, the best-performing submission reached 94.5% classification accuracy across nine crop types after fine-tuning — a 3.7 percentage point improvement over its baseline performance on this dataset. The current leaderboard high score is 84.52%, indicating meaningful headroom for well-adapted models. The tracebloc workspace stays active as a persistent commercial channel: new AI companies access the dataset through the workspace, the platform collects training access fees, and the imagery never leaves the infrastructure. The leaderboard documents which approaches perform best on this specific agricultural imagery.

The Operational Challenge

Lena Hoffman, Chief Data Officer at an agri-tech platform headquartered in the Netherlands, has spent three years building what she knows is one of the most valuable remote sensing datasets in European precision agriculture. The platform operates a field-level monitoring service for crop input optimisation and insurance underwriting across 50,000 hectares of farmland in the Netherlands, Belgium, and northern France. Every growing season, Sentinel-2 satellite passes are cross-referenced against field-level harvest records provided by subscribing farms. The result is a dataset that links satellite spectral data to actual yield outcomes — not modelled estimates, not survey responses, but weighed harvest records from grain traders and farm management systems.

Agricultural AI companies building crop type classifiers and yield prediction models need exactly this kind of data. Publicly available Sentinel-2 imagery is plentiful; ground-truth yield records at field level, over multiple seasons, across diverse soil zones and climate conditions, are not. The three dominant use cases for Lena's dataset are crop type classification for land use monitoring, yield prediction for commodity trading, and loss estimation for crop insurance underwriting. All three require a model trained on real, verified, multi-season data — not on imagery labelled by remote sensing analysts making educated guesses from spectral signatures alone.

The commercial problem is a data governance problem. Lena's platform earns revenue from the monitoring service itself, but the dataset it has assembled is worth more as a training resource for agricultural AI than as an internal asset. The challenge: every time she has explored selling data access to an AI company, legal gets involved, the discussion stalls on use restrictions, and the deal either falls apart or closes six months later with contractual terms that are functionally unenforceable. Once imagery leaves the platform's infrastructure, the platform cannot know what the buyer actually does with it. A competitor could use it to train a model that replicates the platform's own monitoring service. An insurance company could train a yield prediction model and cancel its subscription. The data leaves, the commercial advantage dissipates, and the platform has no visibility into what happens next.

Satellite imagery in a commercial agricultural context also carries its own data sensitivity considerations. The imagery, combined with yield records, reveals commercially sensitive information about individual farms: which fields underperformed in a drought year, which operators achieved above-average yields through practice changes, which parcels show soil degradation patterns. Individual farms subscribed to the monitoring service on the understanding that their field-level data would be used to provide them with a service — not commercialised to third parties in a form that might expose their operational performance.

The precision agriculture AI market has grown substantially, and the number of companies claiming to offer satellite-based crop classification is large. Lena's platform has an advantage: its dataset is real, its ground truth is verified, and its geographic scope covers the crop types and soil zones that matter for Northern European agricultural insurance and commodity markets. That advantage is only commercially exploitable if the platform can let companies train on the data without releasing it.

Stakeholders

Lena Hoffman, Chief Data Officer: Owns the monetisation strategy for the satellite imagery and yield dataset. KPIs: revenue from training access fees, data governance compliance, competitive advantage preservation
Head of Precision Agriculture: Manages the monitoring service; crop type classification accuracy directly affects the quality of field-level recommendations delivered to subscribing farms
Remote Sensing Lead: Responsible for data pipeline quality — Sentinel-2 tile processing, NDVI computation, seasonal normalisation; the training data released to AI companies must not contain processing artefacts that could be reverse-engineered to reveal proprietary methodology
Legal / Commercial: Contractual framework for data access must be enforceable; past experience with data licensing deals has shown that export-based licensing is practically unenforceable
Farm Data Governance Lead: Individual farm operators have data sharing rights under the EU Farm Data Act; training access must not enable re-identification of individual farm performance from the imagery

The Underlying Dataset

The evaluation dataset contains approximately 27,000 Sentinel-2 satellite image tiles split across a training set of 25,000 tiles and a holdout set of 5,000 tiles. Full dataset statistics, class distributions, and spectral characteristics are available in the Exploratory Data Analysis tab.

This dataset is augmented. It was constructed to reflect the statistical structure of real-world Sentinel-2 agricultural imagery — the crop type distribution, the image dimensions and spectral properties characteristic of Sentinel-2 derived tiles, and the class imbalance structure observed in Northern European farmland — without containing any proprietary field geometry, farm identification, or ground-truth yield records.

Property	Value
Total tiles	~27,000
Training set	25,000 tiles
Holdout set	5,000 tiles
Image dimensions	64×64 px
Image format	Grayscale (Sentinel-2 derived)
Classes	9 crop types
Evaluation metric	Classification accuracy (multi-class)
Current leaderboard high score	84.52%

Class distribution (training set — from API):

Crop Type	Tiles	Share
Barley	4,015	~18.6%
Wheat	2,411	~11.2%
Rye	2,391	~11.1%
Oats	2,374	~11.0%
Non-crop	2,425	~11.2%
Sugar beet	1,989	~9.2%
Potato	2,031	~9.4%
Corn	1,982	~9.2%
Rapeseed	1,982	~9.2%

Barley is the most represented class at approximately twice the frequency of the smallest classes — a distribution that reflects the actual prevalence of barley across Northern European farmland in the regions from which the imagery was sourced. A model that classifies every tile as Barley achieves only 18.6% accuracy — which is why per-class recall matters alongside overall accuracy, particularly for high-value but lower-frequency crops such as rapeseed and potato.

How Evaluation Works

Each agricultural AI company submitted their crop classification model to the tracebloc workspace. The evaluation ran in two phases.

Phase 1 — Baseline performance. Each model was benchmarked as-submitted on the 5,000-tile holdout set, with no exposure to the satellite tile training set. This establishes the true out-of-distribution baseline: what each model actually delivers on imagery it has not been adapted to, without any fine-tuning to the spectral characteristics or crop mix of this specific dataset.

Phase 2 — Fine-tuning. Contributors were given access to the training environment inside the tracebloc workspace. Each contributor transferred their classification model into tracebloc and ran training on the 25,000-tile training set. The training process fine-tuned the model weights to the spectral signatures, seasonal variation patterns, and nine-class crop distribution of this agricultural dataset. After training, the adapted model was evaluated automatically against the 5,000-tile holdout set. No satellite tiles were exported. Each contributor received only their own results; no contributor had visibility into another's training runs or scores before the leaderboard published.

Each contributor received:

Training access: 25,000 annotated satellite tiles (nine crop types at realistic Northern European prevalence distribution) for model fine-tuning inside the workspace
Evaluation environment: Sandboxed execution — adapted models run against the holdout set, no tile export path available
Metrics tracked: Overall classification accuracy, per-class recall across all nine crop types, accuracy on the four lowest-frequency classes (corn, rapeseed, potato, sugar beet)
Commercial terms: Training access fees charged per model submission cycle; performance results published to the leaderboard after the evaluation window closes

Results

→ View the full model leaderboard — complete model rankings, per-class accuracy breakdown, and submission history.

Model	Baseline Accuracy	After Fine-tuning	Rare Class Recall	Cost per km² p.a.
Model A (ResNet-50)	78.4%	90.8%	81.3%	€0.25
Model B (ViT-B/16) ✅	82.1%	94.5%	87.6%	€0.22
Model C (EfficientNet)	80.6%	93.2%	79.4%	€0.10

What the numbers reveal:

Model B achieved the strongest result across both overall accuracy and rare class recall after fine-tuning. Starting from an 82.1% baseline — the strongest out-of-distribution starting point in the evaluation — it reached 94.5% after training on the platform's 25,000-tile archive while maintaining 87.6% recall on the four lowest-frequency crop classes. Its baseline advantage suggests a model architecture that generalises well to multi-temporal Sentinel-2 imagery without requiring extensive domain adaptation.

Model C claimed the most competitive pricing at €0.10 per km² per year. Its 80.6% baseline and 93.2% post-fine-tuning accuracy are competitive, but its rare class recall of 79.4% trails Model B by 8.2 percentage points. For a crop insurance underwriting use case where rapeseed and potato misclassification directly affects indemnity calculations, that gap translates to systematic pricing errors. Low cost per km² does not compensate for crop type misclassification on high-value, low-frequency crops.

Model A showed the largest relative gain from fine-tuning — 12.4 percentage points from 78.4% to 90.8% — indicating a model with strong representational capacity that was substantially under-adapted to Northern European crop signatures before exposure to the training set. Its rare class recall of 81.3% is adequate but sits below Model B's performance across all evaluation dimensions.

Business Impact

Illustrative assumptions: Platform covers 1.5 million km² of farmland under monitoring contracts / misclassification cost: €18 per km² (input misallocation, yield loss, insurance pricing error) / internal model maintenance cost: €280,000 per year / AI licence priced per km² per year

Approach	Accuracy	Error Rate	Misclassified Area	Misclassification Cost	AI Cost (p.a.)	Total Annual Cost
Internal baseline	84.5%	15.5%	232,500 km²	€4,185,000	€280,000	€4,465,000
Model A	90.8%	9.2%	138,000 km²	€2,484,000	€375,000	€2,859,000
Model B ✅	94.5%	5.5%	82,500 km²	€1,485,000	€330,000	€1,815,000
Model C	93.2%	6.8%	102,000 km²	€1,836,000	€150,000	€1,986,000

Model B reduces total annual cost from €4,465,000 (internal baseline) to €1,815,000 — a saving of €2,650,000 per year — despite its licence cost being three times higher than Model C's. Model C's headline price point is attractive until the misclassification cost difference is calculated: the additional 19,500 km² of misclassified farmland generates €351,000 in additional operational cost per year, more than offsetting the €180,000 licence saving. Without tracebloc, a procurement decision based on licence price alone would have selected Model C at a cost of €171,000 per year in additional misclassification exposure.

The revenue side of the equation matters as much as the cost side. Lena's platform earns training access fees from every AI company that submits a model to the workspace. In this evaluation cycle with three competing AI companies, the workspace generates direct revenue while the satellite imagery never leaves the platform's infrastructure and the proprietary dataset retains its full commercial value for future evaluation cycles.

Decision

Lena's platform selects Model B as the preferred classification engine for the monitoring service's crop type layer, deploying it initially across 20% of the active monitoring area for a six-week accuracy validation against field-level yield records. The validation confirms that 94.5% accuracy holds against ground-truth harvest data before the model is extended to full operational coverage.

The tracebloc workspace stays active as a permanent commercial channel. New agricultural AI companies entering the precision agriculture AI market pay for training access to the satellite archive. Existing contributors re-submit updated models as new growing season data is added to the training set. The leaderboard tracks performance improvements over time, giving Lena's platform ongoing visibility into which model architectures are advancing and which are plateauing. The dataset becomes a recurring revenue source that scales with the quality of the archive, not with the size of the team managing data licensing agreements.

Explore this use case further:

View the model leaderboard — full model rankings, per-class accuracy, submission history
Explore the dataset — crop type distribution, spectral characteristics, class balance analysis
Start training — submit your own crop classification model to this evaluation

Related use cases: See how the same workspace monetisation model applies to drone object detection for traffic monitoring. For a broader view of what federated learning applications look like across industries, see our federated learning applications guide.

Deploy your workspace or schedule a call.

Disclaimer

Disclaimer: The dataset used in this use case is augmented — designed to closely reflect the statistical structure of real-world Sentinel-2 agricultural satellite imagery, including crop type distribution across nine classes, the prevalence weighting of Northern European farmland cover, and the 64×64 pixel tile dimensions characteristic of field-level satellite classification datasets, without containing any proprietary field geometry, farm identification, or ground-truth yield records. The persona, company configuration, claimed performance figures, business impact assumptions, and data monetisation scenario are illustrative and based on patterns observed across precision agriculture and remote sensing data commercialisation. They do not represent any specific agri-tech platform, AI vendor, or contractual outcome.