Matching Algorithm Overview
The FindU matching algorithm is a sophisticated ML system that learns from user behavior to provide increasingly personalized college recommendations.How It Works
Core Concept
The algorithm uses a multi-factor scoring system that considers:- Academic fit (test scores, GPA, selectivity)
- Financial fit (cost, aid, family income)
- Cultural fit (size, location, campus culture)
- Outcome fit (graduation rates, career prospects)
Learning System
Algorithm Phases
Phase 1: Discovery (0-10 swipes)
- Goal: Learn basic preferences
- Strategy: High diversity in recommendations
- Exploration Rate: 30%
- UI Feedback: “Learning your preferences…”
Phase 2: Refinement (10-30 swipes)
- Goal: Narrow down preferences
- Strategy: Test preference boundaries
- Exploration Rate: 20%
- UI Feedback: “Refining matches…”
Phase 3: Stable (30+ swipes)
- Goal: Deliver best matches
- Strategy: Exploit learned preferences
- Exploration Rate: 10%
- UI Feedback: “Personalized for you”
Scoring System
Base Score Calculation
Grade Mapping
- A+: 90-100% - Perfect matches
- A: 85-89% - Excellent matches
- A-: 80-84% - Very good matches
- B+: 75-79% - Good matches
- B: 70-74% - Solid matches
- B-: 65-69% - Decent matches
- C+: 60-64% - Acceptable matches
- C: 55-59% - Marginal matches
Key Innovations
1. Probe School Strategy
Strategically inserts “probe” schools to test preference boundaries:2. Thompson Sampling
Uses multi-armed bandit approach for exploration/exploitation:3. Real-time Weight Learning
Adapts feature weights based on swipe patterns:Feature Engineering
Academic Features
- SAT/ACT Percentile Match: How student compares to school’s admitted students
- Selectivity Alignment: Match between student strength and school selectivity
- Academic Rigor Preference: High/Medium/Low preference handling
Financial Features
- Affordability Score: Net price vs. family budget
- Value Score: Outcomes relative to cost
- Aid Availability: Merit and need-based aid chances
Cultural Features
- Size Preference: Small/Medium/Large with fuzzy boundaries
- Location Match: Distance and urban/rural preferences
- Culture Vectors: Greek life, research, diversity, etc.
Outcome Features
- Graduation Rate: 4-year and 6-year rates
- Employment Outcomes: Post-graduation employment
- Earnings Potential: Median earnings by major
Performance Optimizations
Caching Strategy
Batch Processing
- Load schools in chunks of 100
- Vectorized operations with NumPy
- Parallel score calculation
Mobile Optimization
- Compressed responses (field shortening)
- Aggressive caching (5min recommendations)
- Pagination support
- Offline-first architecture
Integration Points
API Endpoints
Core Endpoints
POST /recommendations- Get personalized matches (requires student_id)POST /swipe- Record swipe interactionPOST /continuous-feed- Get continuous recommendations with swipe dataGET /health- Health check endpoint
V2 Minimal Onboarding
POST /v2/onboarding/minimal- Create minimal profileGET /v2/onboarding/category-schools- Get 6 category schoolsPOST /v2/onboarding/category-swipe- Record category preferencePOST /v2/recommendations- Get recommendations with minimal profile
Mobile Endpoints (/m/*)
GET /m/recommendations/{student_id}- Compressed recommendationsPOST /m/batch/schools- Get multiple schoolsPOST /m/sync/pull- Pull changes for offline syncPOST /m/sync/push- Push offline changes
Data Requirements
From student profile:- Academic data (GPA, test scores)
- Preferences (size, location, major)
- Demographics (for financial aid)
- Interaction history
- Admission statistics
- Cost and aid data
- Outcome metrics
- Campus characteristics
Monitoring & Analytics
Key Metrics
- Match Quality: Average match scores
- Engagement: Swipe-through rates
- Learning Efficiency: Convergence speed
- Diversity: Recommendation variety
A/B Testing Framework
Recent Updates
Continuous Feed Fix (2024)
- Problem: Feed was running out of schools after ~50 recommendations
- Solution: Implemented school recycling for non-swiped schools
- Result: Truly continuous feed that never runs out
Key Improvements
- Session-based tracking: Schools shown but not swiped can reappear
- Smart recycling: 5-minute cooldown before schools can be shown again
- Feed health metrics: Track queue size and recycling effectiveness
- Test scripts:
test_feed_fix.pyandtest_feed_curl.shfor validation
Testing & Development
Running Locally
Testing the Feed
Future Enhancements
In Progress
- Cold start optimization: Better first 3-5 recommendations
- Enhanced explanations: More detailed match breakdowns
- Category learning: Improved onboarding flow
Research Areas
- Deep learning for preference modeling
- Graph neural networks for school relationships
- Predictive admissions modeling
- Natural language processing for essays