Model card: crop stage prediction model

What it is

Overview

Our advanced models integrate field-specific data with long-term daily weather data and soil characteristics to predict crop growth stages across various crops and regions. These predictions can be generated both before the growing season (using historical weather data) and during the season (using current year's weather data).

While our model provides accurate predictions, it is designed to complement, and not replace, field scouting practices.

Inputs and Outputs

Required field inputs

field location (latitude and longitude, point)
crop
maturity index and/or variety name (depends on crop)
planting date

Optional inputs

The optional inputs can be modified by users for more custom model tuning, otherwise default values will be used.

Field inputs (will have default values for each crop in case not provided) :
- planting parameters: depth, density, raw spacing, water capacity at planting
- water management: irrigated vs. rainfed
Model configuration
- Lookback period used for computing pre-season crop stage timing estimations (a.ka. Historical time period)
Response configuration
- filtering on prediction features (see outputs)
- prediction time period

Outputs

Historical Growth Stage Simulations: his feature provides detailed year-by-year growth stage predictions using data from multiple years.
Aggregated Growth Stage Simulations: this feature presents consolidated growth stage predictions summarized from multi-year data and actual weather forecast data is integrated to provide in-season short-term predictions.

All features types are available using the same endpoint. Users can specify the required computation option by using the ‘query.filter’ object in the API payload.

How it works

Algorithm principles

The growth stage prediction model provides the most probable start date of each growth stage.

To compute crop stage predictions, the system runs a crop development model on several past year weather scenarios for the required location. For each of these simulation, the model applies the cropping parameters (plating date, variety, etc…) provided by the user when requesting the model execution. Then it computes statistical mean and median calculation over the collection of predicted timings of crop stage start across all past year simulations.

The model predicts when each crop stage is likely to begin. Here's how it works:

For predictions before current date and next 5 days:
1. The model uses actual recorded daily weather data and short-term weather forecast for the field location selected
2. It simulates how your crop would grow applying:
  - The planting date provided
  - The crop variety selected
  - Other custom details entered about the field and crop
3. The model then calculates a unique start date for each predicted growth stage
4. As the prediction is expressed in mean and median values, those are equal for predictions that occur between planting and the current date plus 5 days into the future.
For predictions after current date and next 5 days:
1. The model uses daily weather data from several past years for your field's location.
2. For each past year, it simulates how your crop would grow applying:
  - The planting date provided
  - The crop variety selected
  - Other custom details entered about the field and crop
3. This creates a collection of predicted timings of crop stage start: one date for each growth stage and for each past year simulated
4. The model then calculates the mean (average) and median (middle value) of these predictions.
5. These averages become the forecasted values, showing the most likely dates for each crop stage to begin.

Training and continuous improvement

Annual model calibration: The model cross validation is a recurrent yearly task that incorporates new field observation data to (re-)calibrate and cross-validate the model for existing and newly launched commercial varieties.

Performance monitoring: the model inference data and predictions are continuously monitored by scientists to track and timely mitigate major data quality or model drift issues.

How to use

How to interpret insights

Interpretation of crop stage start date predictions after current date and next 5 days

Predictions rely on historical averages
The mean value is the mathematical average of all simulated start dates for a specific crop stage from past years.
- The mean date takes into account how the crop would have developed in an average weather scenario derived from various weather conditions over past years.
The median value represents the middle value in the set of simulated start dates for a specific crop stage
- The median is less affected by extreme weather conditions events in the past years compared to the mean, hence provides a robust "typical" value that isn't skewed by unusual seasons.

Aspect	Mean	Median	Interpretation
Definition and calculation	The mean growth stage start date is the mathematical average of all simulated start dates from past years.	Middle value in the dataset	Larger difference between mean and median suggests more skewed data over past years.
Sensitivity to outliers	More sensitive to past extreme or unusual seasons in terms of weather conditions.	Less sensitive to extreme weather conditions events in the past years, hence provides a robust "typical" value that isn't skewed by unusual seasons.	If mean is significantly different from median, it means the field location has undergone significant weather variability over the past seasons.
Representation of "typical" year	Can be skewed by extreme years.	Often better represents a typical year	Median may be more reliable if data is skewed.
When they're similar	Indicates symmetrical distribution of predictions.	Suggests consistent crop stage timing across years	More confidence in the prediction
Interpretation with weather	More influenced by extreme weather years	Better represents "normal" weather years	Compare both to assess the weather variability of the location
When mean is earlier	Suggests some very early years (i.e. 'hot' years) are pulling the average down	May better represent the most likely timing	The location is often affected by abnormal weather conditions shortening the crop cycle.
When median is earlier	Indicates more years with earlier start dates, but some late years (i.e. 'cold' years) are skewing the mean	May better represent the most common timing	The location is often affected by abnormal weather conditions delaying the crop cycle.

Interpretation of crop stage start date predictions before current date

Predictions are influenced by actual weather data and short term weather forecast

Limitations and caveats:

Limited In-Season Adjustments: The model does not incorporate real-time field observations during the growing season. Its accuracy is expected to decrease as the crop cycle progresses, especially if early stages predictions are significantly off track.
Weather Dependency: Predictions are highly dependent on weather data quality and forecast accuracy.
Variety Specificity: Ensure the correct variety is selected, as predictions are calibrated for specific varieties and relative maturities and model predictions are significantly impacted by these variables.

Use-case example

Request:

On December 1, 2024, a user requests a forecast for crop stage X of Wheat VARIETY A, planted on October 1, 2024, using a 10-year lookback period.

Process:

The system retrieves weather data for the field location from 2013 to 2023.
For each year, the crop model simulates crop development applying:
- VARIETY A
- October 1 planting date
- That year's weather data
This produces 10 predicted start dates for crop stage X (one per year).
For predictions after December 1, the forecast is returned as the mean and median values of these 10 dates.
For predictions between October 1 (planting date) and December 6 (current date + 5 days), actual recorded weather data and short-term weather forecast is used. The returned mean and median values are equal.

Intended use

Growth stages predictions are primarily used to timely alert growers about key management practices to be done at specific stages of the crops (e.g. pesticides applications and fields harvest priority).

But the model is also intended to be consumed as enabler component of other predictive models such as pest and disease risk, grain dry down, or to characterize the environmental conditions for the different growth stages of the crops.

Limitations

Insights accuracy

The accuracy of growth stage predictions is very sensitive to weather data. Hence, poor weather data quality is expected to impact model reliability.
Significant frost conditions can result in the model not providing all growth stage predictions. This is triggered when the minimum daily temperatures are below crop specific physiological threshold.
Abiotic stresses that could affect the crop’s growth and development may not be considered by the models. This could result in poor model accuracy.
Note that model predictions are intended to facilitate and optimize farming management but cannot, under any circumstances, replace physical field scouting and expert agronomist assessments.

System availability

The model uses ERAT5T weather data for simulation and prediction. ERA5T data is updated every day in the inference data store.

Due to the execution of the inference data pipeline, the model is not available from 5:30 AM to 6:30 AM UTC.

System information

Latency metrics

Direct request to model

Number of requests	AVG Resp. time / request (in ms)	Min resp. time / request (in ms)	Max resp. time / request (in ms)	Error
1	5500	5000	6000	0
10002	410	141	2359	0

Testing time: 45 minutes
Explanation of test results: Each valid request takes about 5-6 sec to respond. The system is set with 20 pods working simultaneously to respond to requests at the peak load and 2 pods as minimum load. So when 10k requests are being sent for load testing. these 10K requests are being divided amongst pods (between 2 and 20). So Avg response time above in table is average time over 10K request. Minimum time corresponds to response from Cache system.