AI Term 5 min read

Recall

A classification metric measuring the proportion of actual positive cases correctly identified by the model, indicating the model's ability to find all relevant instances.


Recall

Recall (also known as Sensitivity or True Positive Rate) is a fundamental classification metric that measures the proportion of actual positive cases that the model correctly identified. It answers the question: "Of all the actual positive cases, how many did the model successfully find?" Recall is crucial for applications where missing positive cases (false negatives) is costly or dangerous.

Mathematical Definition

Basic Formula
Recall = True Positives / (True Positives + False Negatives)

Alternative Expression
Recall = True Positives / All Actual Positives

Range
Recall values range from 0 to 1, where:

  • 1.0 = Perfect recall (no false negatives)
  • 0.0 = No true positives found

Conceptual Understanding

Focus on Actual Positives
Recall specifically evaluates coverage of positive cases:

  • Ignores true negatives and false positives
  • Measures completeness of positive identification
  • Higher recall means fewer missed positive cases
  • Important when positive class detection is critical

Coverage vs Quality Trade-off
Recall often trades off with precision:

  • Lower thresholds increase recall, decrease precision
  • More liberal predictions improve recall
  • Recall-precision balance requires careful consideration
  • F1-score harmonizes both metrics

Applications by Domain

Medical Screening
High recall critical for:

  • Cancer screening programs
  • Disease outbreak detection
  • Emergency condition identification
  • Preventive health screening

Security and Safety
Critical incident detection:

  • Intrusion detection systems
  • Fraud detection in financial systems
  • Safety hazard identification
  • Threat assessment systems

Information Retrieval
Comprehensive search results:

  • Academic literature search
  • Legal document discovery
  • Patent prior art searches
  • Regulatory compliance auditing

Multi-Class Recall

Macro-Averaged Recall
Average recall across all classes:

  • Calculate recall for each class separately
  • Take arithmetic mean of class recalls
  • Treats all classes equally
  • Good for understanding per-class performance

Micro-Averaged Recall
Global recall calculation:

  • Pool all true positives and false negatives
  • Calculate single recall value
  • Weighted by class frequency
  • Emphasizes performance on frequent classes

Weighted Recall
Class-frequency weighted average:

  • Weight each class recall by its frequency
  • Accounts for class imbalance naturally
  • Balances macro and micro approaches
  • Standard in many ML libraries

Recall-Precision Relationship

Recall-Precision Trade-off
Fundamental relationship in classification:

  • Higher recall often means lower precision
  • Lower decision thresholds increase both metrics initially
  • Eventually precision decreases as recall approaches 1.0
  • Optimal balance depends on application requirements

F1-Score Integration
Harmonic mean of precision and recall:

  • F1 = 2 × (Precision × Recall) / (Precision + Recall)
  • Balances both metrics equally
  • Single metric for model comparison
  • Useful when both metrics are important

Common Scenarios

High Recall Requirements
When false negatives are costly:

  • Medical diagnosis (missing diseases dangerous)
  • Security screening (missing threats catastrophic)
  • Quality control (missing defects costly)
  • Legal discovery (missing evidence problematic)

Recall vs Efficiency Trade-offs
Balancing coverage and resources:

  • High recall may require reviewing many cases
  • Applications where thoroughness is paramount
  • Screening and filtering applications
  • Comprehensive monitoring systems

Improving Recall

Model Adjustments

  • Decrease decision threshold for positive predictions
  • Use ensemble methods for broader coverage
  • Implement cost-sensitive learning
  • Apply class balancing techniques

Data Strategies

  • Increase training data for positive class
  • Apply data augmentation techniques
  • Use synthetic data generation
  • Improve minority class representation

Feature Engineering

  • Add features sensitive to positive cases
  • Engineer domain-specific indicators
  • Remove features that mask positive signals
  • Use feature selection for relevant attributes

Limitations and Considerations

Ignores Specificity
Recall doesn't account for:

  • False positive rates
  • Precision of positive predictions
  • Overall classification accuracy
  • True negative identification

Can Be Artificially High
Easy to achieve high recall by:

  • Predicting everything as positive
  • Using very low decision thresholds
  • Sacrificing precision entirely
  • Must be balanced with other metrics

Threshold Dependency
For probabilistic classifiers:

  • Recall varies with decision threshold
  • Single recall value may not represent full capability
  • Consider recall-precision curves
  • Application-specific threshold optimization needed

Evaluation Best Practices

Contextual Interpretation

  • Consider domain-specific implications
  • Compare against relevant baselines
  • Evaluate practical significance
  • Understand cost of false negatives

Statistical Rigor

  • Report confidence intervals
  • Use cross-validation for robust estimates
  • Test significance of improvements
  • Consider multiple evaluation runs

Comprehensive Reporting

  • Always report alongside precision
  • Include F1-score or F-beta scores
  • Provide confusion matrix analysis
  • Add domain-specific metrics

Special Cases

Imbalanced Datasets

  • Recall particularly important for minority classes
  • May need stratified sampling strategies
  • Consider balanced evaluation approaches
  • Monitor recall for each class separately

Time Series and Sequential Data

  • Temporal aspects of recall
  • Early detection capabilities
  • Latency considerations
  • Streaming evaluation approaches

Understanding recall is essential for building comprehensive machine learning systems, especially in applications where failing to identify positive cases has serious consequences and complete coverage is more important than prediction precision.

← Back to Glossary
EU Made in Europe

Chat with 100+ AI Models in one App.

Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.

Customer Support