Releases: Lightning-AI/pytorch-lightning
Standard weekly patch release
[1.2.6] - 2021-03-30
Changed
- Changed the behavior of
on_epoch_startto run at the beginning of validation & test epoch (#6498)
Removed
- Removed legacy code to include
stepdictionary returns incallback_metrics. Useself.log_dictinstead. (#6682)
Fixed
- Fixed
DummyLogger.log_hyperparamsraising aTypeErrorwhen running withfast_dev_run=True(#6398) - Fixed error on TPUs when there was no
ModelCheckpoint(#6654) - Fixed
trainer.testfreeze on TPUs (#6654) - Fixed a bug where gradients were disabled after calling
Trainer.predict(#6657) - Fixed bug where no TPUs were detected in a TPU pod env (#6719)
Contributors
@awaelchli, @carmocca, @ethanwharris, @kaushikb11, @rohitgr7, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Weekly patch release - torchmetrics compatibility
[1.2.5] - 2021-03-23
Changed
- Added Autocast in validation, test and predict modes for Native AMP (#6565)
- Update Gradient Clipping for the TPU Accelerator (#6576)
- Refactored setup for typing friendly (#6590)
Fixed
- Fixed a bug where
all_gatherwould not work correctly withtpu_cores=8(#6587) - Fixed comparing required versions (#6434)
- Fixed duplicate logs appearing in console when using the python logging module (#6275)
Contributors
@awaelchli, @Borda, @ethanwharris, @justusschock, @kaushikb11
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.4] - 2021-03-16
Changed
- Changed the default of
find_unused_parametersback toTruein DDP and DDP Spawn (#6438)
Fixed
- Expose DeepSpeed loss parameters to allow users to fix loss instability (#6115)
- Fixed DP reduction with collection (#6324)
- Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size (#4688)
- Fixed broadcast to use PyTorch
broadcast_object_listand addreduce_decision(#6410) - Fixed logger creating directory structure too early in DDP (#6380)
- Fixed DeepSpeed additional memory use on rank 0 when default device not set early enough (#6460)
- Fixed
DummyLogger.log_hyperparamsraising aTypeErrorwhen running withfast_dev_run=True(#6398) - Fixed an issue with
Tuner.scale_batch_sizenot finding the batch size attribute in the datamodule (#5968) - Fixed an exception in the layer summary when the model contains torch.jit scripted submodules (#6511)
- Fixed when Train loop config was run during
Trainer.predict(#6541)
Contributors
@awaelchli, @kaushikb11, @Palzer, @SeanNaren, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.3] - 2021-03-09
Fixed
- Fixed
ModelPruning(make_pruning_permanent=True)pruning buffers getting removed when saved during training (#6073) - Fixed when
_stable_1d_sortto work whenn >= N(#6177) - Fixed
AttributeErrorwhenlogger=Noneon TPU (#6221) - Fixed PyTorch Profiler with
emit_nvtx(#6260) - Fixed
trainer.testfrombest_pathhangs after callingtrainer.fit(#6272) - Fixed
SingleTPUcallingall_gather(#6296) - Ensure we check deepspeed/sharded in multinode DDP (#6297)
- Check
LightningOptimizerdoesn't delete optimizer hooks (#6305) - Resolve memory leak for evaluation (#6326)
- Ensure that clip gradients is only called if the value is greater than 0 (#6330)
- Fixed
Trainernot resettinglightning_optimizerswhen callingTrainer.fit()multiple times (#6372)
Contributors
@awaelchli, @carmocca, @chizuchizu, @frankier, @SeanNaren, @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.2] - 2021-03-02
Added
- Added
checkpointparameter to callback'son_save_checkpointhook (#6072)
Changed
- Changed the order of
backward,step,zero_gradtozero_grad,backward,step(#6147) - Changed default for DeepSpeed CPU Offload to False, due to prohibitively slow speeds at smaller scale (#6262)
Fixed
- Fixed epoch level schedulers not being called when
val_check_interval < 1.0(#6075) - Fixed multiple early stopping callbacks (#6197)
- Fixed incorrect usage of
detach(),cpu(),to()(#6216) - Fixed LBFGS optimizer support which didn't converge in automatic optimization (#6147)
- Prevent
WandbLoggerfrom dropping values (#5931) - Fixed error thrown when using valid distributed mode in multi node (#6297)
Contributors
@akihironitta, @borisdayma, @carmocca, @dvolgyes, @SeanNaren, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.2.1] - 2021-02-23
Fixed
- Fixed incorrect yield logic for the amp autocast context manager (#6080)
- Fixed priority of plugin/accelerator when setting distributed mode (#6089)
- Fixed error message for AMP + CPU incompatibility (#6107)
Contributors
@awaelchli, @SeanNaren, @carmocca
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Pruning & Quantization & SWA
[1.2.0] - 2021-02-18
Added
- Added
DataType,AverageMethodandMDMCAverageMethodenum in metrics (#5657) - Added support for summarized model total params size in megabytes (#5590)
- Added support for multiple train loaders (#1959)
- Added
Accuracymetric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using thetop_kparameter (#4838) - Added
Accuracymetric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with thesubset_accuracyparameter (#4838) - Added
HammingDistancemetric to compute the hamming distance (loss) (#4838) - Added
max_fprparameter toaurocmetric for computing partial auroc metric (#3790) - Added
StatScoresmetric to compute the number of true positives, false positives, true negatives and false negatives (#4839) - Added
R2Scoremetric (#5241) - Added
LambdaCallback(#5347) - Added
BackboneLambdaFinetuningCallback(#5377) - Accelerator
all_gathersupports collection (#5221) - Added
image_gradientsfunctional metric to compute the image gradients of a given input image. (#5056) - Added
MetricCollection(#4318) - Added
.clone()method to metrics (#4318) - Added
IoUclass interface (#4704) - Support to tie weights after moving model to TPU via
on_post_move_to_devicehook - Added missing val/test hooks in
LightningModule(#5467) - The
RecallandPrecisionmetrics (and their functional counterpartsrecallandprecision) can now be generalized to Recall@K and Precision@K with the use oftop_kparameter (#4842) - Added
ModelPruningCallback (#5618, #5825, #6045) - Added
PyTorchProfiler(#5560) - Added compositional metrics (#5464)
- Added Trainer method
predict(...)for high performence predictions (#5579) - Added
on_before_batch_transferandon_after_batch_transferdata hooks (#3671) - Added AUC/AUROC class interface (#5479)
- Added
PredictLoopobject (#5752) - Added
QuantizationAwareTrainingcallback (#5706, #6040) - Added
LightningModule.configure_callbacksto enable the definition of model-specific callbacks (#5621) - Added
dimtoPSNRmetric for mean-squared-error reduction (#5957) - Added promxial policy optimization template to pl_examples (#5394)
- Added
log_graphtoCometLogger(#5295) - Added possibility for nested loaders (#5404)
- Added
sync_stepto Wandb logger (#5351) - Added
StochasticWeightAveragingcallback (#5640) - Added
LightningDataModule.from_datasets(...)(#5133) - Added
PL_TORCH_DISTRIBUTED_BACKENDenv variable to select backend (#5981) - Added
Trainerflag to activate Stochastic Weight Averaging (SWA)Trainer(stochastic_weight_avg=True)(#6038) - Added DeepSpeed integration (#5954, #6042)
Changed
- Changed
stat_scoresmetric now calculates stat scores over all classes and gains new parameters, in line with the newStatScoresmetric (#4839) - Changed
computer_vision_fine_tunningexample to useBackboneLambdaFinetuningCallback(#5377) - Changed
automatic castingfor LoggerConnectormetrics(#5218) - Changed
iou[func] to allow float input (#4704) - Metric
compute()method will no longer automatically callreset()(#5409) - Set PyTorch 1.4 as min requirements, also for testing and examples
torchvision>=0.5andtorchtext>=0.5(#5418) - Changed
callbacksargument inTrainerto allowCallbackinput (#5446) - Changed the default of
find_unused_parameterstoFalsein DDP (#5185) - Changed
ModelCheckpointversion suffixes to start at 1 (#5008) - Progress bar metrics tensors are now converted to float (#5692)
- Changed the default value for the
progress_bar_refresh_rateTrainer argument in Google COLAB notebooks to 20 (#5516) - Extended support for purely iteration-based training (#5726)
- Made
LightningModule.global_rank,LightningModule.local_rankandLightningModule.loggerread-only properties (#5730) - Forced
ModelCheckpointcallbacks to run after all others to guarantee all states are saved to the checkpoint (#5731) - Refactored Accelerators and Plugins (#5743)
- Added base classes for plugins (#5715)
- Added parallel plugins for DP, DDP, DDPSpawn, DDP2 and Horovod (#5714)
- Precision Plugins (#5718)
- Added new Accelerators for CPU, GPU and TPU (#5719)
- Added Plugins for TPU training (#5719)
- Added RPC and Sharded plugins (#5732)
- Added missing
LightningModule-wrapper logic to new plugins and accelerator (#5734) - Moved device-specific teardown logic from training loop to accelerator (#5973)
- Moved accelerator_connector.py to the connectors subfolder (#6033)
- Trainer only references accelerator (#6039)
- Made parallel devices optional across all plugins (#6051)
- Cleaning (#5948, #5949, #5950)
- Enabled
self.login callbacks (#5094) - Renamed xxx_AVAILABLE as protected (#5082)
- Unified module names in Utils (#5199)
- Separated utils: imports & enums (#5256, #5874)
- Refactor: clean trainer device & distributed getters (#5300)
- Simplified training phase as LightningEnum (#5419)
- Updated metrics to use LightningEnum (#5689)
- Changed the seq of
on_train_batch_end,on_batch_end&on_train_epoch_end,on_epoch_end hooks(#5688) - Refactored
setup_trainingand removetest_mode(#5388) - Disabled training with zero
num_training_batcheswhen insufficientlimit_train_batches(#5703) - Refactored
EpochResultStore(#5522) - Update
lr_finderto check for attribute if not runningfast_dev_run(#5990) - LightningOptimizer manual optimizer is more flexible and expose
toggle_model(#5771) MlflowLoggerlimit parameter value length to 250 char (#5893)- Re-introduced fix for Hydra directory sync with multiple process (#5993)
Deprecated
- Function
stat_scores_multiple_classesis deprecated in favor ofstat_scores(#4839) - Moved accelerators and plugins to its
legacypkg (#5645) - Deprecated
LightningDistributedDataParallelin favor of new wrapper moduleLightningDistributedModule(#5185) - Deprecated
LightningDataParallelin favor of new wrapper moduleLightningParallelModule(#5670) - Renamed utils modules (#5199)
argparse_utils>>argparsemodel_utils>>model_helperswarning_utils>>warningsxla_device_utils>>xla_device
- Deprecated using
'val_loss'to set theModelCheckpointmonitor (#6012) - Deprecated
.get_model()with explicit.lightning_moduleproperty (#6035) - Deprecated Trainer attribute
accelerator_backendin favor ofaccelerator(#6034)
Removed
- Removed deprecated checkpoint argument
filepath(#5321) - Removed deprecated
Fbeta,f1_scoreandfbeta_scoremetrics (#5322) - Removed deprecated
TrainResult(#5323) - Removed deprecated
EvalResult(#5633) - Removed
LoggerStages(#5673)
Fixed
- Fixed distributed setting and
ddp_cpuonly withnum_processes>1(#5297) - Fixed the saved filename in
ModelCheckpointwhen it already exists (#4861) - Fixed
DDPHPCAcceleratorhangs in DDP construction by callinginit_device(#5157) - Fixed
num_workersfor Windows example (#5375) - Fixed loading yaml (#5619)
- Fixed support custom DataLoader with DDP if they can be re-instantiated (#5745)
- Fixed repeated
.fit()calls ignore max_steps iteration bound (#5936) - Fixed throwing
MisconfigurationErroron unknown mode (#5255) - Resolve bug with Finetuning (#5744)
- Fixed
ModelCheckpointrace condition in file existence check (#5155) - Fixed some compatibility with PyTorch 1.8 (#5864)
- Fixed forward cache (#5895)
- Fixed recursive detach of tensors to CPU (#6007)
- Fixed passing wrong strings for scheduler interval doesn't throw an error (#5923)
- Fixed wrong
requires_gradstate afterreturn Nonewith multiple optimizers (#5738) - Fixed add
on_epoch_endhook at the end ofvalidation,testepoch (#5986) - Fixed missing
process_dataloadercall forTPUSpawnwhen in distributed mode (#6015) - Fixed progress bar flickering by appending 0 to floats/strings (#6009)
- Fixed synchronization issues with TPU training (#6027)
- Fixed
hparams.yamlsaved twice when usingTensorBoardLogger(#5953) - Fixed basic examples (#5912, #5985)
- Fixed
fairscalecompatible with PT 1.8 (#5996) - Ensured
process_dataloaderis called whentpu_cores > 1to use Parallel DataLoader (#6015) - Attempted SLURM auto resume call when non-shell call fails (#6002)
- Fixed wrapping optimizers upon assignment (#6006)
- Fixed allowing hashing of metrics with lists in their state (#5939)
Contributors
@alanhdu, @ananthsub, @awaelchli, @Borda, @borisdayma, @carmocca, @ddrevicky, @deng-cy, @ducthienbui97, @justusschock, @kartik4949, @kaushikb11, @manipopopo, @marload, @neighthan, @peblair, @prampey, @pranjaldatta, @rohitgr7, @SeanNaren, @sid-sundrani, @SkafteNicki, @tadejsv, @tchaton, @teddykoker, @titu1994, @yuntai
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
Standard weekly patch release
[1.1.7] - 2021-02-03
Fixed
- Fixed
TensorBoardLoggernot closingSummaryWriteronfinalize(#5696) - Fixed filtering of pytorch "unsqueeze" warning when using DP (#5622)
- Fixed
num_classesargument in F1 metric (#5663) - Fixed
log_dirproperty (#5537) - Fixed a race condition in
ModelCheckpointwhen checking if a checkpoint file exists (#5144) - Remove unnecessary intermediate layers in Dockerfiles (#5697)
- Fixed auto learning rate ordering (#5638)
Contributors
@awaelchli @guillochon @noamzilo @rohitgr7 @SkafteNicki @sumanthratna
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Standard weekly patch release
[1.1.6] - 2021-01-26
Changed
- Increased TPU check timeout from 20s to 100s (#5598)
- Ignored
stepparam in Neptune logger's log_metric method (#5510) - Pass batch outputs to
on_train_batch_endinstead ofepoch_endoutputs (#4369)
Fixed
- Fixed
toggle_optimizerto resetrequires_gradstate (#5574) - Fixed FileNotFoundError for best checkpoint when using DDP with Hydra (#5629)
- Fixed an error when logging a progress bar metric with a reserved name (#5620)
- Fixed
Metric'sstate_dictnot included when child modules (#5614) - Fixed Neptune logger creating multiple experiments when GPUs > 1 (#3256)
- Fixed duplicate logs appearing in console when using the python logging module (#5509)
- Fixed tensor printing in
trainer.test()(#5138) - Fixed not using dataloader when
hparamspresent (#4559)
Contributors
@awaelchli @bryant1410 @lezwon @manipopopo @PiotrJander @psinger @rnett @SeanNaren @swethmandava @tchaton
If we forgot someone due to not matching commit email with GitHub account, let us know :]