Description
Welcome to the "gCO2e of AI" code competition, a pioneering challenge designed to advance the field of sustainable AI development. This competition is not just about building the most performant AI models; it’s about creating models that excel in both performance and energy efficiency.
In this competition, we’ve implemented an innovative scoring system that evaluates both the accuracy of your models and their computational efficiency, measured in FLOPS (Floating Point Operations per Second) during inference. The final score is weighted with 80% emphasis on accuracy and 20% on efficiency:

This scoring system encourages the development of high-performance AI models while penalizing those that are overly resource-intensive. Learn more about the scoring in the Evaluation section.
To facilitate transparency and emphasize the environmental impact of AI models, we have introduced two key metrics on our platform:
FLOPS (Floating Point Operations): This metric measures the total computational power used by your model during inference, providing an indication of how resource-intensive your model is.
gCO2e (grams of CO2 equivalent): This metric quantifies the carbon emissions associated with the energy consumed by your model, offering a way to assess the environmental impact of your AI solution.
By focusing on key metrics like FLOPS (Floating Point Operations) and carbon emissions (gCO2e), we aim to set a new standard for sustainable AI practices in the industry.
Whether you're a data scientist looking to showcase your skills or a business leader interested in sustainable innovation, this competition offers a unique platform to demonstrate your expertise and commitment to the future of AI.
The Challenge
The core challenge of this competition is a computer vision task focused on keypoint detection. Your objective is to develop a model that accurately detects 16 key points of human posture.

The applications of these models are vast, spanning various industries. For instance, they can be used in high-risk environments to monitor worker fatigue, in robotics, and in numerous other scenarios where human interaction is a key component of business processes.
The provided dataset includes over 11,000 high-resolution images of individuals engaged in various work-related and non-work-related activities. Each image is annotated with 16 keypoints, including the right and left ankle, right and left knee, right and left hip, pelvis, thorax, head top, upper neck, right and left wrist, right and left elbow, and right and left shoulder, capturing critical postures of the human body.
Data
For this competition, we utilize the MPII Human Pose Dataset, an open-source and widely recognized benchmark for human pose estimation tasks. This dataset is designed to evaluate articulated human pose estimation and includes a comprehensive collection of images annotated with body joints.
- Dataset Overview: The MPII Human Pose Dataset consists of approximately 11,000 images featuring over 40,000 individuals engaged in a wide variety of everyday activities. The dataset covers 410 distinct human activities, each image annotated with detailed body part positions. This extensive range of activities provides a robust foundation for training models that can generalize well across different scenarios.
- Training Dataset: For the purposes of this competition, a subset of the MPII dataset is utilized. The training dataset includes 6,666 images, carefully selected to provide a balanced representation of the various activities and poses.
- Test Dataset: The test datasets consists of 4,430 images each. This test set is used to evaluate the performance of the models submitted to the competition. The test set annotations are withheld to prevent overfitting and ensure a fair evaluation process.
- Link to Dataset: You can access the MPII Human Pose Dataset here. This link provides access to the full dataset, including instructions on how to download and utilize it for your model development.
This dataset, with its rich annotations and diverse activities, is well-suited for training and evaluating AI models in the keypoint detection task central to this competition. The use of this established dataset ensures that models developed during the competition are trained on high-quality, industry-standard data.
Data Details
Dataset Details
The dataset is crucial for the Pose4Safety competition, providing extensive visual data derived from various workplace environments. It features over 11,000 high-resolution images capturing individuals in diverse occupational activities, some displaying signs of fatigue. Each person in the images is meticulously annotated with keypoints that outline essential body joints.
This dataset was meticulously compiled and annotated to provide a robust framework for developing algorithms that detect early signs of fatigue through keypoint analysis. By training models on this dataset, participants can significantly contribute to preventing workplace accidents and ensuring employee safety.
Credits
The annotations in the dataset include precise locations and labels for keypoints such as elbows, wrists, knees, and ankles, essential for monitoring and analyzing human motion and posture. The dataset's detailed keypoints facilitate the detection of subtle signs of fatigue, such as slumped shoulders and slow movements, which are critical in ensuring worker safety.
Evaluation
Evaluation
The formula for scoring in this competition evaluates the performance of your AI model based on two key factors: accuracy (specifically, the Percentage of Correct Keypoints, or PCK) and computational efficiency (measured in FLOPS utilized). Below is a detailed breakdown of the formula and each variable:

1. Accuracy (PCK):
- Variable: {accuracy (PCK)}
- Definition: The PCK (Percentage of Correct Keypoints) measures how accurately the model detects keypoints compared to the ground truth. A keypoint is considered correct if the distance between the predicted and ground truth keypoints is less than a defined threshold.
- PCK Calculation:
- Threshold: Set at 0.2.
- Distance Calculation: Compute the Euclidean distance between the predicted keypoints and the ground truth keypoints using: distances=torch.linalg.norm(predicted_keypoints−ground_truth_keypoints,dim=2)
- Correct Keypoints: A keypoint is considered correct if the distance is less than the threshold. This is determined by: correct_keypoints=(distances<threshold).float()
- PCK Value: The PCK value is calculated as the mean of correct keypoints: pck=correct_keypoints.mean().item()
- Weight in the Score: 80% of the total score is based on this accuracy measure, making it the most significant factor in the final score.
2. FLOPS Utilized:
- Variable: {flops_utilized}
- Definition: FLOPS (Floating Point Operations) measure the number of floating-point calculations the model performs during inference. A model that uses fewer FLOPS is considered more computationally efficient.
- Role in the Formula: The formula penalizes models that require more FLOPS by subtracting the minimum FLOPS observed ({min_flops}) from the FLOPS utilized by your model and dividing this by the range between the maximum and minimum FLOPS observed ({max_flops}−{min_flops}). This ensures that models using more FLOPS than necessary are penalized, lowering their overall score.
3. Minimum and Maximum FLOPS:
- Variables: {min_flops} and {max_flops}
- Definition: These values represent the lowest and highest FLOPS utilized among all submitted models in the competition. They are used to normalize the FLOPS utilized by each model.
- Role in the Formula: The normalization ensures that the efficiency penalty is scaled relative to the range of FLOPS used by all participants. Models closer to the minimum FLOPS will receive a smaller penalty, positively contributing to their final score.
How the Formula Works:
- Accuracy Contribution: The score starts by multiplying the accuracy (PCK) by 0.8, reflecting the model's effectiveness in correctly detecting keypoints.
- Efficiency Contribution: The efficiency term is calculated next. The formula starts with a value of 0, representing perfect efficiency. The more FLOPS your model uses relative to others, the more this value is reduced, which decreases the final score.
- Final Score: The final score is a weighted sum of the accuracy and efficiency components. A model with high PCK and low FLOPS usage will score highly, promoting both high accuracy and computational efficiency.
Summary:
- The PCK (Percentage of Correct Keypoints) is a critical measure of the model's accuracy in detecting keypoints. It is calculated by determining how many keypoints are correctly predicted within a set distance from the ground truth.
- The FLOPS utilized directly impacts the score by penalizing models that are more computationally demanding, encouraging the development of efficient models.
This scoring formula incentivizes the creation of models that balance high accuracy with low computational cost, fostering innovation in sustainable AI development.