AgInsights Model Cards
Model card: crop stage prediction model
What it is
Overview
Our advanced models integrate field-specific data with long-term daily weather data and soil characteristics to predict crop growth stages across various crops and regions. These predictions can be generated both before the growing season (using historical weather data) and during the season (using current year's weather data).
While our model provides accurate predictions, it is designed to complement, and not replace, field scouting practices.
Inputs and Outputs
Required field inputs
- field location (latitude and longitude, point)
- crop
- maturity index and/or variety name (depends on crop)
- planting date
Optional inputs
The optional inputs can be modified by users for more custom model tuning, otherwise default values will be used.
- Field inputs (will have default values for each crop in case not provided) :
- planting parameters: depth, density, raw spacing, water capacity at planting
- water management: irrigated vs. rainfed
- Model configuration
- Lookback period used for computing pre-season crop stage timing estimations (a.ka. Historical time period)
- Response configuration
- filtering on prediction features (see outputs)
- prediction time period
Outputs
- Historical Growth Stage Simulations: his feature provides detailed year-by-year growth stage predictions using data from multiple years.
- Aggregated Growth Stage Simulations: this feature presents consolidated growth stage predictions summarized from multi-year data and actual weather forecast data is integrated to provide in-season short-term predictions.
All features types are available using the same endpoint. Users can specify the required computation option by using the ‘query.filter’ object in the API payload.
How it works
Algorithm principles
The growth stage prediction model provides the most probable start date of each growth stage.
To compute crop stage predictions, the system runs a crop development model on several past year weather scenarios for the required location. For each of these simulation, the model applies the cropping parameters (plating date, variety, etc…) provided by the user when requesting the model execution. Then it computes statistical mean and median calculation over the collection of predicted timings of crop stage start across all past year simulations.
The model predicts when each crop stage is likely to begin. Here's how it works:
For predictions before current date and next 5 days:
- The model uses actual recorded daily weather data and short-term weather forecast for the field location selected
- It simulates how your crop would grow applying:
- The planting date provided
- The crop variety selected
- Other custom details entered about the field and crop
- The model then calculates a unique start date for each predicted growth stage
- As the prediction is expressed in mean and median values, those are equal for predictions that occur between planting and the current date plus 5 days into the future.
For predictions after current date and next 5 days:
- The model uses daily weather data from several past years for your field's location.
- For each past year, it simulates how your crop would grow applying:
- The planting date provided
- The crop variety selected
- Other custom details entered about the field and crop
- This creates a collection of predicted timings of crop stage start: one date for each growth stage and for each past year simulated
- The model then calculates the mean (average) and median (middle value) of these predictions.
- These averages become the forecasted values, showing the most likely dates for each crop stage to begin.
Training and continuous improvement
Annual model calibration: The model cross validation is a recurrent yearly task that incorporates new field observation data to (re-)calibrate and cross-validate the model for existing and newly launched commercial varieties.
Performance monitoring: the model inference data and predictions are continuously monitored by scientists to track and timely mitigate major data quality or model drift issues.
How to use
How to interpret insights
Interpretation of crop stage start date predictions after current date and next 5 days
- Predictions rely on historical averages
- The mean value is the mathematical average of all simulated start dates for a specific crop stage from past years.
- The mean date takes into account how the crop would have developed in an average weather scenario derived from various weather conditions over past years.
- The median value represents the middle value in the set of simulated start dates for a specific crop stage
- The median is less affected by extreme weather conditions events in the past years compared to the mean, hence provides a robust "typical" value that isn't skewed by unusual seasons.
Aspect | Mean | Median | Interpretation |
---|---|---|---|
Definition and calculation | The mean growth stage start date is the mathematical average of all simulated start dates from past years. | Middle value in the dataset | Larger difference between mean and median suggests more skewed data over past years. |
Sensitivity to outliers | More sensitive to past extreme or unusual seasons in terms of weather conditions. | Less sensitive to extreme weather conditions events in the past years, hence provides a robust "typical" value that isn't skewed by unusual seasons. | If mean is significantly different from median, it means the field location has undergone significant weather variability over the past seasons. |
Representation of "typical" year | Can be skewed by extreme years. | Often better represents a typical year | Median may be more reliable if data is skewed. |
When they're similar | Indicates symmetrical distribution of predictions. | Suggests consistent crop stage timing across years | More confidence in the prediction |
Interpretation with weather | More influenced by extreme weather years | Better represents "normal" weather years | Compare both to assess the weather variability of the location |
When mean is earlier | Suggests some very early years (i.e. 'hot' years) are pulling the average down | May better represent the most likely timing | The location is often affected by abnormal weather conditions shortening the crop cycle. |
When median is earlier | Indicates more years with earlier start dates, but some late years (i.e. 'cold' years) are skewing the mean | May better represent the most common timing | The location is often affected by abnormal weather conditions delaying the crop cycle. |
Interpretation of crop stage start date predictions before current date
- Predictions are influenced by actual weather data and short term weather forecast
Limitations and caveats:
- Limited In-Season Adjustments: The model does not incorporate real-time field observations during the growing season. Its accuracy is expected to decrease as the crop cycle progresses, especially if early stages predictions are significantly off track.
- Weather Dependency: Predictions are highly dependent on weather data quality and forecast accuracy.
- Variety Specificity: Ensure the correct variety is selected, as predictions are calibrated for specific varieties and relative maturities and model predictions are significantly impacted by these variables.
Use-case example
Request:
On December 1, 2024, a user requests a forecast for crop stage X of Wheat VARIETY A, planted on October 1, 2024, using a 10-year lookback period.
Process:
- The system retrieves weather data for the field location from 2013 to 2023.
- For each year, the crop model simulates crop development applying:
- VARIETY A
- October 1 planting date
- That year's weather data
- This produces 10 predicted start dates for crop stage X (one per year).
- For predictions after December 1, the forecast is returned as the mean and median values of these 10 dates.
- For predictions between October 1 (planting date) and December 6 (current date + 5 days), actual recorded weather data and short-term weather forecast is used. The returned mean and median values are equal.
Intended use
Growth stages predictions are primarily used to timely alert growers about key management practices to be done at specific stages of the crops (e.g. pesticides applications and fields harvest priority).
But the model is also intended to be consumed as enabler component of other predictive models such as pest and disease risk, grain dry down, or to characterize the environmental conditions for the different growth stages of the crops.
Limitations
Insights accuracy
- The accuracy of growth stage predictions is very sensitive to weather data. Hence, poor weather data quality is expected to impact model reliability.
- Significant frost conditions can result in the model not providing all growth stage predictions. This is triggered when the minimum daily temperatures are below crop specific physiological threshold.
- Abiotic stresses that could affect the crop’s growth and development may not be considered by the models. This could result in poor model accuracy.
- Note that model predictions are intended to facilitate and optimize farming management but cannot, under any circumstances, replace physical field scouting and expert agronomist assessments.
System availability
The model uses ERAT5T weather data for simulation and prediction. ERA5T data is updated every day in the inference data store.
Due to the execution of the inference data pipeline, the model is not available from 5:30 AM to 6:30 AM UTC.
System information
Latency metrics
Direct request to model
Number of requests | AVG Resp. time / request (in ms) | Min resp. time / request (in ms) | Max resp. time / request (in ms) | Error |
---|---|---|---|---|
1 | 5500 | 5000 | 6000 | 0 |
10002 | 410 | 141 | 2359 | 0 |
- Testing time: 45 minutes
- Explanation of test results: Each valid request takes about 5-6 sec to respond. The system is set with 20 pods working simultaneously to respond to requests at the peak load and 2 pods as minimum load. So when 10k requests are being sent for load testing. these 10K requests are being divided amongst pods (between 2 and 20). So Avg response time above in table is average time over 10K request. Minimum time corresponds to response from Cache system.