Hello everyone,
I am running an undersampling algorithm from imblearn, and at the back I want to train several gbms (catboost, xgb, lgbm, etc) with successivehalving to optimize the hyperparams.
Based on imblearn docs, I need to set up the undersampler within a pipeline, but that's innefficient, because in theory, I could undersample all 3 folds of the training data, store the left out folds with the original distribution, and then on those, train the various models with SH.
So my question is: how can I pass data folds (undersampled for training and original for the left out) to successive halving (or any hyperparam search for that matter)?
I posted a [question on stackoverflow](https://stackoverflow.com/questions/79748461/how-to-pass-pre⦠with this issue. Sharing it here in case someone knows.
Thanks a lot!
Best
Sole
Soledad Galli
https://www.trainindata.com/
Hello Scikit-learn team,
My name is Ayan Dey, and I am a 2nd-year B.Tech student specializing in
Artificial Intelligence & Machine Learning.
I have experience working with Python, LLM APIs, and AI-powered
applications, and I am currently building and deploying real-world AI
tools.
I am interested in contributing to Scikit-learn as part of my preparation
for GSoC 2026, and I would like to get familiar with the codebase,
development workflow, and ongoing priorities.
I have already gone through the documentation and contribution guidelines
and am exploring the open issues on GitHub, especially those marked as
βgood first issue.β
Iβd appreciate any advice on:
- Which areas or modules could use beginner-friendly contributions right
now?
- How contributors usually coordinate on feature discussions or bug fixes
here?
- Any resources you recommend for better understanding Scikit-learnβs
internals?
I look forward to collaborating with and learning from this amazing
community.
Best regards,
Ayan Dey
GitHub: [https://github.com/35250]
LinkedIn: [www.linkedin.com/in/ayandey212105242]