Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre LightGBM | Framework Deep Dive
Advanced Tree-Based Models with Python
Section 2. Chapitre 2
single

single

bookLightGBM

Glissez pour afficher le menu

LightGBM is a gradient boosting framework that stands out for its unique approach to tree construction and feature handling. Two of its core innovations—histogram binning and leaf-wise tree growth—are central to its reputation for high speed and efficiency, especially on large datasets.

Histogram binning

  • Discretizes continuous feature values into a fixed number of bins before training;
  • Groups feature values into these bins, reducing the number of split candidates during tree construction;
  • Speeds up computation and reduces memory usage, since raw feature data can be stored more compactly as bin indices.

Leaf-wise tree growth

  • Always splits the leaf with the maximum loss reduction, regardless of its depth;
  • Differs from traditional level-wise algorithms that grow all leaves at the same depth in parallel;
  • Also known as "best-first" or "leaf-wise" growth;
  • Can produce deeper, more complex trees that capture intricate patterns in the data;
  • Boosts accuracy, but may increase the risk of overfitting on smaller datasets—LightGBM provides parameters to control tree complexity.

Together, histogram binning and leaf-wise growth allow LightGBM to train much faster and with a lower memory footprint than many other gradient boosting frameworks, particularly when handling large, high-dimensional datasets.

12345678910111213141516171819202122232425262728293031323334353637
import time import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from lightgbm import LGBMClassifier # Generate a synthetic dataset X, y = make_classification( n_samples=20000, n_features=50, n_informative=30, n_redundant=10, n_classes=2, random_state=42 ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Initialize LightGBM classifier lgbm = LGBMClassifier( n_estimators=100, max_depth=8, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, random_state=42 ) # Time the training process start_time = time.time() lgbm.fit(X_train, y_train) end_time = time.time() fit_time = end_time - start_time print("LightGBM fit time (seconds):", fit_time)
copy
Note
Note

Compared to XGBoost, LightGBM's histogram-based binning and leaf-wise tree growth typically result in faster training times and lower memory consumption when using similar hyperparameters. While XGBoost uses a level-wise tree growth strategy and can be slower on large, high-dimensional datasets, LightGBM's optimizations allow it to process data more efficiently. However, the actual speed and memory advantage may depend on dataset characteristics and parameter settings.

Tâche

Glissez pour commencer à coder

You are given a synthetic binary classification dataset. Your task is to:

  1. Load and split the data.
  2. Initialize a LightGBM classifier with parameters:
    • n_estimators=150.
    • learning_rate=0.05.
    • max_depth=6.
    • subsample=0.8.
    • colsample_bytree=0.8.
  3. Train the model and obtain predictions on the test set.
  4. Compute accuracy and store it in accuracy_value.
  5. Print the shapes of the datasets and the final accuracy.

Solution

Switch to desktopPassez à un bureau pour une pratique réelleContinuez d'où vous êtes en utilisant l'une des options ci-dessous
Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 2. Chapitre 2
single

single

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

some-alt