Contributing Guide¶
Thank you for your interest in contributing to the Leichte Sprache API! This guide will help you get started with development and understand our contribution process.
Quick Start for Contributors¶
- Fork and clone the repository
- Set up your development environment
- Make changes following our guidelines
- Test your changes thoroughly
- Submit a pull request
Development Setup¶
Prerequisites¶
- Python 3.11+ (Python 3.12 recommended)
- Git for version control
- uv (recommended) or pip for package management
- 8GB+ RAM for running ML models
Environment Setup¶
# 1. Fork the repository on GitHub
# 2. Clone your fork
git clone https://github.com/YOUR_USERNAME/leichte-sprache.git
cd leichte-sprache
# 3. Set up development environment
uv venv
source .venv/bin/activate
uv sync --extra dev
# 4. Install pre-commit hooks
pre-commit install
# 5. Download required models
uv run python -m spacy download de_core_news_lg
# 6. Run tests to verify setup
python test-suite/test_runner.py
Development Dependencies¶
Additional tools for development:
# Code formatting and linting
ruff check .
black --check .
# Type checking
mypy .
# Documentation
mkdocs serve
# Testing
pytest
Project Structure¶
Understanding the codebase layout:
leichte-sprache/
βββ api_main.py # FastAPI application
βββ analysis_service.py # Core analysis service
βββ config.py # Application configuration
β
βββ regeln/ # Rule modules (18 rules)
β βββ fremdwoerter/ # Foreign words rule
β βββ satzlaenge/ # Sentence length rule
β βββ ... # Other rules
β
βββ tools/ # Utilities and CLI tools
β βββ agent_optimizer.py # LLM generation engine
β βββ ml/ # ML training tools
β
βββ test-suite/ # Test framework
β βββ test_runner.py # Test runner
β βββ [rule_name]/ # Test cases per rule
β
βββ docs/ # Documentation (MkDocs)
βββ prompts/ # LLM prompts for generation
βββ scripts/ # Deployment and utility scripts
Types of Contributions¶
We welcome various types of contributions:
π Bug Reports¶
Found a bug? Help us fix it:
- Check existing issues to avoid duplicates
- Use the bug report template
- Provide minimal reproduction steps
- Include system information
β¨ Feature Requests¶
Have an idea for improvement?
- Check if it's already requested
- Explain the use case clearly
- Consider backward compatibility
- Discuss implementation approach
π Documentation¶
Documentation improvements are always welcome:
- Fix typos and grammatical errors
- Improve clarity and examples
- Add missing documentation
- Translate content (German/English)
π§ Code Contributions¶
Adding New Rules¶
The most common contribution type. See the Rule Development section below for details.
Core Improvements¶
- API enhancements
- Performance optimizations
- Testing improvements
- CI/CD pipeline updates
Development Guidelines¶
Code Style¶
We use automated formatting and linting:
Python Style Guide¶
- PEP 8 compliance (enforced by Black)
- Type hints for all public functions
- Docstrings for classes and functions
- Clear variable names in German or English
Example Function¶
def pruefe_regel(doc: spacy.tokens.Doc) -> List[str]:
"""
PrΓΌft Text auf VerstΓΆΓe gegen die SatzlΓ€nge-Regel.
Args:
doc: SpaCy-verarbeitetes Dokument
Returns:
Liste von VerstΓΆΓen als String-Beschreibungen
Raises:
ValueError: Wenn das Dokument ungΓΌltig ist
"""
violations = []
for sent in doc.sents:
word_count = len([token for token in sent if not token.is_punct])
if word_count > 15:
violations.append(f"Satz zu lang: {word_count} WΓΆrter (max. 15)")
return violations
Testing Requirements¶
All contributions must include tests:
Rule Tests¶
Each rule must have comprehensive test cases:
# Test specific rule
python test-suite/test_runner.py satzlaenge
# Add test case in test-suite/satzlaenge/
# Format: test_name.txt
# expected: flag|pass
# description: Test description
Test text goes here.
API Tests¶
For API changes:
# tests/test_api.py
def test_analyse_endpoint():
response = client.post("/analyse", json={"text": "Test text."})
assert response.status_code == 200
assert "annotated_text" in response.json()
Performance Tests¶
For performance-critical changes:
Git Workflow¶
We use the GitHub Flow:
# 1. Create feature branch
git checkout -b feature/add-punctuation-rule
# 2. Make changes and commit
git add .
git commit -m "feat: add punctuation rule for Leichte Sprache compliance"
# 3. Push to your fork
git push origin feature/add-punctuation-rule
# 4. Open pull request on GitHub
Commit Message Format¶
We follow Conventional Commits:
Types:
- feat: New features
- fix: Bug fixes
- docs: Documentation changes
- style: Code style changes
- refactor: Code refactoring
- test: Test additions/changes
- chore: Maintenance tasks
Examples:
feat(rules): add punctuation validation rule
fix(api): handle empty text input correctly
docs: update installation guide for uv
test: add test cases for compound word detection
Pull Request Process¶
Before Submitting¶
-
Sync with main branch:
-
Run full test suite:
-
Update documentation if needed
-
Add changelog entry if significant
PR Template¶
Use this template for your pull request:
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Tests pass locally
- [ ] New tests added
- [ ] Manual testing completed
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes (or documented)
Review Process¶
- Automated checks must pass (CI/CD)
- Code review by maintainers
- Testing in development environment
- Merge when approved
Rule Development¶
Rule Structure¶
Each rule follows this structure:
regeln/rule_name/
βββ __init__.py # Exports pruefe_regel
βββ regel.py # Main rule implementation
βββ config.py # Configuration settings
βββ README.md # Rule documentation
βββ model/ # ML models (if needed)
βββ model.pkl
βββ tokenizer.pkl
Rule Implementation¶
# regeln/example_rule/regel.py
import spacy
from typing import List
def pruefe_regel(doc: spacy.tokens.Doc) -> List[str]:
"""
Implement your rule logic here.
Args:
doc: spaCy document object
Returns:
List of violation messages
"""
violations = []
# Your rule logic here
for token in doc:
if your_condition(token):
violations.append(f"Violation: {token.text}")
return violations
def your_condition(token) -> bool:
"""Helper function for rule logic."""
return token.pos_ == "NOUN" and len(token.text) > 20
Rule Configuration¶
# regeln/example_rule/config.py
REGEL_NAME = "example_rule"
REGEL_BESCHREIBUNG = "Checks for example violations"
REGEL_KATEGORIE = "lexical" # syntax, lexical, stylistic, technical
REGEL_AKTIVIERT = True
REGEL_PRIORITAET = 5 # 1-10, higher = more important
Rule Documentation¶
# Example Rule
## Purpose
Brief description of what this rule checks.
## Leichte Sprache Guidelines
Reference to official guidelines this rule implements.
## Examples
### Violations
- Input: "Problematic text"
- Output: "Violation message"
### Correct Usage
- Input: "Correct text"
- Output: No violations
## Configuration
List of configuration options.
## Implementation Notes
Technical details for developers.
ML Model Contributions¶
Training New Models¶
- Prepare training data in the required format
- Use the ML training CLI:
- Evaluate model performance:
- Submit model with training logs
Model Requirements¶
- Performance: Minimum 80% accuracy on test set
- Size: Models should be < 100MB when possible
- Format: Use standardized formats (pickle, ONNX, HuggingFace)
- Documentation: Include training procedure and metrics
Documentation Contributions¶
MkDocs Setup¶
# Install documentation dependencies
pip install mkdocs-material mkdocs-awesome-pages-plugin
# Serve locally
mkdocs serve
# Build static site
mkdocs build
Documentation Guidelines¶
- Clear structure: Use proper headings and sections
- Code examples: Include working examples
- Screenshots: Add when helpful (store in docs/assets/)
- Links: Use relative links for internal docs
- Language: English for technical docs, German for user-facing content
Writing Style¶
- Concise and clear explanations
- Active voice when possible
- Step-by-step instructions for procedures
- Examples for complex concepts
Community Guidelines¶
Code of Conduct¶
- Be respectful and inclusive
- Help newcomers learn and contribute
- Give constructive feedback in reviews
- Acknowledge contributions from others
Communication¶
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Questions and general discussion
- Pull Request Comments: Code-specific discussion
Release Process¶
Maintainers handle releases, but contributors should know the process:
- Version bumping follows semantic versioning
- Changelog is updated with all changes
- Testing in staging environment
- Docker images are built and published
- Documentation is deployed
Getting Help¶
Stuck? Here's how to get help:
- Check documentation first
- Search existing issues for similar problems
- Ask in GitHub Discussions for questions
- File an issue for bugs or feature requests
Common Questions¶
Q: How do I add a new rule? A: See the Rule Development section in this guide for step-by-step instructions.
Q: My tests are failing, what should I check? A: Ensure you have all models downloaded and your environment is properly set up.
Q: How do I test API changes? A: Use the test suite and manual testing with curl or Swagger UI.
Q: Where do I add my rule's test cases?
A: Create a directory test-suite/your_rule_name/ with .txt files for each test case.
Recognition¶
Contributors are recognized in:
- CONTRIBUTORS.md file
- Release notes for significant contributions
- Documentation credits where appropriate
Thank you for contributing to making German text more accessible! π