Releases: Lightning-AI/pytorch-lightning
DDP and Checkpoint bug fixes
Overview
As we continue to strengthen the codebase with more tests, we’re finally getting rid of annoying bugs that have been around for a bit now. Mostly around the inconsistent checkpoint and early stopping behaviour (amazing work @awaelchli @jeremyjordan )
Noteworthy changes:
- Fixed TPU flag parsing
- fixed average_precision metric
- all the checkpoint issues should be gone now (including backward support for old checkpoints)
- DDP + loggers should be fixed
Detail changes
Added
- Added TorchText support for moving data to GPU (#2379)
Changed
- Changed epoch indexing from 0 instead of 1 (#2289)
- Refactor Model
backward(#2276) - Refactored
training_batch+ tests to verify correctness (#2327, #2328) - Refactored training loop (#2336)
- Made optimization steps for hooks (#2363)
- Changed default apex level to 'O2' (#2362)
Removed
- Moved
TrainsLoggerto Bolts (#2384)
Fixed
- Fixed parsing TPU arguments and TPU tests (#2094)
- Fixed number batches in case of multiple dataloaders and
limit_{*}_batches(#1920, #2226) - Fixed an issue with forward hooks not being removed after model summary (#2298)
- Fix for
load_from_checkpoint()not working with absolute path on Windows (#2294) - Fixed an issue how _has_len handles
NotImplementedErrore.g. raised bytorchtext.data.Iterator(#2293), (#2307) - Fixed
average_precisionmetric (#2319) - Fixed ROC metric for CUDA tensors (#2304)
- Fixed
average_precisionmetric (#2319) - Fixed lost compatibility with custom datatypes implementing
.to(#2335) - Fixed loading model with kwargs (#2387)
- Fixed sum(0) for
trainer.num_val_batches(#2268) - Fixed checking if the parameters are a
DictConfigObject (#2216) - Fixed SLURM weights saving (#2341)
- Fixed swaps LR scheduler order (#2356)
- Fixed adding tensorboard
hparamslogging test (#2342) - Fixed use model ref for tear down (#2360)
- Fixed logger crash on DDP (#2388)
- Fixed several issues with early stopping and checkpoint callbacks (#1504, #2391)
- Fixed loading past checkpoints from v0.7.x (#2405)
- Fixed loading model without arguments (#2403)
Contributors
@airium, @awaelchli, @Borda, @elias-ramzi, @jeremyjordan, @lezwon, @mateuszpieniak, @mmiakashs, @pwl, @rohitgr7, @ssakhavi, @thschaaf, @tridao, @williamFalcon
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Fixing hooks & hparams
Overview
Fixing critical bugs in newly added hooks and hparams assignment.
The recommended data following:
- use
prepare_datato download and process the dataset. - use
setupto do splits, and build your model internals
Detail changes
Metrics, speed improvements, new hooks and flags
Overview
Highlights of this release are adding Metric package and new hooks and flags to customize your workflow.
Major features:
- brand new Metrics package with built-in DDP support (by @justusschock and @SkafteNicki)
hparamscan now be anything! (callself.save_hyperparameters()to register anything in the_init_- many speed improvements (how we move data, adjusted some flags & PL now adds 300ms overhead per epoch only!)
- much faster
ddpimplementation. Old one was renamedddp_spawn - better support for Hydra
- added the overfit_batches flag and corrected some bugs with the
limit_[train,val,test]_batchesflag - added conda support
- tons of bug fixes 😉
Detail changes
Added
- Added
overfit_batches,limit_{val|test}_batchesflags (overfit now uses training set for all three) (#2213) - Added metrics
- Added type hints in
Trainer.fit()andTrainer.test()to reflect that also a list of dataloaders can be passed in (#1723) - Allow dataloaders without sampler field present (#1907)
- Added option
save_lastto save the model at the end of every epoch inModelCheckpoint(#1908) - Early stopping checks
on_validation_end(#1458) - Attribute
best_model_pathtoModelCheckpointfor storing and later retrieving the path to the best saved model file (#1799) - Speed up single-core TPU training by loading data using
ParallelLoader(#2033) - Added a model hook
transfer_batch_to_devicethat enables moving custom data structures to the target device (#1756) - Added black formatter for the code with code-checker on pull (#1610)
- Added back the slow spawn ddp implementation as
ddp_spawn(#2115) - Added loading checkpoints from URLs (#1667)
- Added a callback method
on_keyboard_interruptfor handling KeyboardInterrupt events during training (#2134) - Added a decorator
auto_move_datathat moves data to the correct device when using the LightningModule for inference (#1905) - Added
ckpt_pathoption toLightningModule.test(...)to load particular checkpoint (#2190) - Added
setupandteardownhooks for model (#2229)
Changed
- Allow user to select individual TPU core to train on (#1729)
- Removed non-finite values from loss in
LRFinder(#1862) - Allow passing model hyperparameters as complete kwarg list (#1896)
- Renamed
ModelCheckpoint's attributesbesttobest_model_scoreandkth_best_modeltokth_best_model_path(#1799) - Re-Enable Logger's
ImportErrors (#1938) - Changed the default value of the Trainer argument
weights_summaryfromfulltotop(#2029) - Raise an error when lightning replaces an existing sampler (#2020)
- Enabled prepare_data from correct processes - clarify local vs global rank (#2166)
- Remove explicit flush from tensorboard logger (#2126)
- Changed epoch indexing from 1 instead of 0 (#2206)
Deprecated
- Deprecated flags: (#2213)
overfit_pctin favour ofoverfit_batchesval_percent_checkin favour oflimit_val_batchestest_percent_checkin favour oflimit_test_batches
- Deprecated
ModelCheckpoint's attributesbestandkth_best_model(#1799) - Dropped official support/testing for older PyTorch versions <1.3 (#1917)
Removed
- Removed unintended Trainer argument
progress_bar_callback, the callback should be passed in byTrainer(callbacks=[...])instead (#1855) - Removed obsolete
self._devicein Trainer (#1849) - Removed deprecated API (#2073)
- Packages:
pytorch_lightning.pt_overrides,pytorch_lightning.root_module - Modules:
pytorch_lightning.logging.comet_logger,pytorch_lightning.logging.mlflow_logger,pytorch_lightning.logging.test_tube_logger,pytorch_lightning.overrides.override_data_parallel,pytorch_lightning.core.model_saving,pytorch_lightning.core.root_module - Trainer arguments:
add_row_log_interval,default_save_path,gradient_clip,nb_gpu_nodes,max_nb_epochs,min_nb_epochs,nb_sanity_val_steps - Trainer attributes:
nb_gpu_nodes,num_gpu_nodes,gradient_clip,max_nb_epochs,min_nb_epochs,nb_sanity_val_steps,default_save_path,tng_tqdm_dic
- Packages:
Fixed
- Run graceful training teardown on interpreter exit (#1631)
- Fixed user warning when apex was used together with learning rate schedulers (#1873)
- Fixed multiple calls of
EarlyStoppingcallback (#1863) - Fixed an issue with
Trainer.from_argparse_argswhen passing in unknown Trainer args (#1932) - Fixed bug related to logger not being reset correctly for model after tuner algorithms (#1933)
- Fixed root node resolution for SLURM cluster with dash in hostname (#1954)
- Fixed
LearningRateLoggerin multi-scheduler setting (#1944) - Fixed test configuration check and testing (#1804)
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelt arguments (#1820)
- Fixed
save_weights_onlyin ModelCheckpoint (#1780) - Allow use of same
WandbLoggerinstance for multiple training loops (#2055) - Fixed an issue with
_auto_collect_argumentscollecting local variables that are not constructor arguments and not working for signatures that have the instance not namedself(#2048) - Fixed mistake in parameters' grad norm tracking (#2012)
- Fixed CPU and hanging GPU crash (#2118)
- Fixed an issue with the model summary and
example_input_arraydepending on a specific ordering of the submodules in a LightningModule (#1773) - Fixed Tpu logging (#2230)
- Fixed Pid port + duplicate
rank_zerologging (#2140, #2231)
Contributors
@awaelchli, @baldassarreFe, @Borda, @borisdayma, @cuent, @devashishshankar, @ivannz, @j-dsouza, @justusschock, @kepler, @kumuji, @lezwon, @lgvaz, @LoicGrobol, @mateuszpieniak, @maximsch2, @moi90, @rohitgr7, @SkafteNicki, @tullie, @williamFalcon, @yukw777, @ZhaofengWu
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Transfer learning, tuning batch size, torchelastic support
Overview
Highlights of this release are adding support for TorchElastic enables distributed PyTorch training jobs to be executed in a fault-tolerant and elastic manner; auto-scaling of batch size; new transfer learning example; an option to provide seed to random generators to ensure reproducibility.
Detail changes
Added
- Added callback for logging learning rates (#1498)
- Added transfer learning example (for a binary classification task in computer vision) (#1564)
- Added type hints in
Trainer.fit()andTrainer.test()to reflect that also a list of dataloaders can be passed in (#1723). - Added auto scaling of batch size (#1638)
- The progress bar metrics now also get updated in
training_epoch_end(#1724) - Enable
NeptuneLoggerto work withdistributed_backend=ddp(#1753) - Added option to provide seed to random generators to ensure reproducibility (#1572)
- Added override for hparams in
load_from_ckpt(#1797) - Added support multi-node distributed execution under
torchelastic(#1811, #1818) - Added using
store_truefor bool args (#1822, #1842) - Added dummy logger for internally disabling logging for some features (#1836)
Changed
- Enable
non-blockingfor device transfers to GPU (#1843) - Replace mata_tags.csv with hparams.yaml (#1271)
- Reduction when
batch_size < num_gpus(#1609) - Updated LightningTemplateModel to look more like Colab example (#1577)
- Don't convert
namedtupletotuplewhen transferring the batch to target device (#1589) - Allow passing
hparamsas a keyword argument to LightningModule when loading from checkpoint (#1639) - Args should come after the last positional argument (#1807)
- Made DDP the default if no backend specified with multiple GPUs (#1789)
Deprecated
- Deprecated
tags_csvin favor ofhparams_file(#1271)
Fixed
- Fixed broken link in PR template (#1675)
- Fixed ModelCheckpoint not None checking file path (#1654)
- Trainer now calls
on_load_checkpoint()when resuming from a checkpoint (#1666) - Fixed sampler logic for DDP with the iterable dataset (#1734)
- Fixed
_reset_eval_dataloader()for IterableDataset (#1560) - Fixed Horovod distributed backend to set the
root_gpuproperty (#1669) - Fixed wandb logger
global_stepaffects other loggers (#1492) - Fixed disabling progress bar on non-zero ranks using Horovod backend (#1709)
- Fixed bugs that prevent LP finder to be used together with early stopping and validation dataloaders (#1676)
- Fixed a bug in Trainer that prepended the checkpoint path with
version_when it shouldn't (#1748) - Fixed LR key name in case of param groups in LearningRateLogger (#1719)
- Fixed saving native AMP scaler state (introduced in #1561)
- Fixed accumulation parameter and suggestion method for learning rate finder (#1801)
- Fixed num processes wasn't being set properly and auto sampler was DDP failing (#1819)
- Fixed bugs in semantic segmentation example (#1824)
- Fixed saving native AMP scaler state (#1561, #1777)
- Fixed native AMP + DDP (#1788)
- Fixed
hparamlogging with metrics (#1647)
Contributors
@ashwinb, @awaelchli, @Borda, @cmpute, @festeh, @jbschiratti, @justusschock, @kepler, @kumuji, @nanddalal, @nathanbreitsch, @olineumann, @pitercl, @rohitgr7, @S-aiueo32, @SkafteNicki, @tgaddair, @tullie, @tw991, @williamFalcon, @ybrovman, @yukw777
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Critical DDP bug fixes
We made a few changes to Callbacks to test ops on detached GPU tensors to avoid CPU transfer. However, it made callbacks unpicklable which will crash DDP.
This release fixes that core issue
Changed
- Allow logging of metrics together with hparams (#1630)
- Allow metrics logged together with hparams (#1630)
Removed
- Removed Warning from trainer loop (#1634)
Fixed
- Fixed ModelCheckpoint not being fixable (#1632)
- Fixed CPU DDP breaking change and DDP change (#1635)
- Tested pickling (#1636)
Contributors
PyTorch 1.5 support, native PyTorch AMP, speed/memory optimizations and many bug fixes
Key updates
- PyTorch 1.5 support
- Added Horovod distributed_backend option
- Enable forward compatibility with the native AMP (PyTorch 1.6).
- Support 8-core TPU on Kaggle
- Added ability to customize progress_bar via Callbacks
- Speed/memory optimizations.
- Improved Argparse usability with Trainer
- Docs improvements
- Tons of bug fixes
Detail changes
Added
- Added flag
replace_sampler_ddpto manually disaple sampler replacement in ddp (#1513) - Added speed parity tests (max 1 sec difference per epoch)(#1482)
- Added
auto_select_gpusflag to trainer that enables automatic selection of available GPUs on exclusive mode systems. - Added learining rate finder (#1347)
- Added support for ddp mode in clusters without SLURM (#1387)
- Added
test_dataloadersparameter toTrainer.test()(#1434) - Added
terminate_on_nanflag to trainer that performs a NaN check with each training iteration when set toTrue(#1475) - Added speed parity tests (max 1 sec difference per epoch)(#1482)
- Added
terminate_on_nanflag to trainer that performs a NaN check with each training iteration when set toTrue. (#1475) - Added
ddp_cpubackend for testing ddp without GPUs (#1158) - Added Horovod support as a distributed backend
Trainer(distributed_backend='horovod')(#1529) - Added support for 8 core distributed training on Kaggle TPU's (#1568)
- Added support for native AMP (#1561, [#1580)
Changed
- Changed the default behaviour to no longer include a NaN check with each training iteration. (#1475)
- Decoupled the progress bar from trainer. It is a callback now and can be customized or even be replaced entirely (#1450).
- Changed lr schedule step interval behavior to update every backwards pass instead of every forwards pass (#1477)
- Defines shared proc. rank, remove rank from instances (e.g. loggers) (#1408)
- Updated semantic segmentation example with custom u-net and logging (#1371)
- Disabled val and test shuffling (#1600)
Deprecated
- Deprecated
training_tqdm_dictin favor ofprogress_bar_dict(#1450).
Removed
- Removed
test_dataloadersparameter fromTrainer.fit()(#1434)
Fixed
- Added the possibility to pass nested metrics dictionaries to loggers (#1582)
- Fixed memory leak from opt return (#1528)
- Fixed saving checkpoint before deleting old ones (#1453)
- Fixed loggers - flushing last logged metrics even before continue, e.g.
trainer.test()results (#1459) - Fixed optimizer configuration when
configure_optimizersreturns dict withoutlr_scheduler(#1443) - Fixed
LightningModule- mixing hparams and arguments inLightningModule.__init__()crashes load_from_checkpoint() (#1505) - Added a missing call to the
on_before_zero_gradmodel hook (#1493). - Allow use of sweeps with WandbLogger (#1512)
- Fixed a bug that caused the
callbacksTrainer argument to reference a global variable (#1534). - Fixed a bug that set all boolean CLI arguments from Trainer.add_argparse_args always to True (#1571)
- Fixed do not copy the batch when training on a single GPU (#1576, [#1579)
- Fixed soft checkpoint removing on DDP (#1408)
- Fixed automatic parser bug (#1585)
- Fixed bool conversion from string (#1606)
Contributors
@alexeykarnachev, @areshytko, @awaelchli, @Borda, @borisdayma, @ethanwharris, @fschlatt, @HenryJia, @Ir1d, @justusschock, @karlinjf, @lezwon, @neggert, @rmrao, @rohitgr7, @SkafteNicki, @tgaddair, @williamFalcon
If we forgot someone due to not matching commit email with GitHub account, let us know :]
DDP bug fixes
We had a few (subtle) bugs that affected DDP and a few key things in 0.7.2 so we released 0.7.3 to fix them because they are critical for DDP. sorry about that! still, no API changes, but please do skip straight to 0.7.3 upgrade for those fixes
Detail changes
Added
- Added
rank_zero_warnfor warning only in rank 0 (#1428)
Fixed
- Fixed default
DistributedSamplerfor DDP training (#1425) - Fixed workers warning not on windows (#1430)
- Fixed returning tuple from
run_training_batch(#1431) - Fixed gradient clipping (#1438)
- Fixed pretty print (#1441)
Contributors
Many bug fixes, added flexibility, parity tests with pytorch and more
Overview
This release aims at fixing particular issues and improving the user development experience via extending docs, adding typing and supporting python 3.8. In particular, some of the release highlights are:
- Added benchmark for comparing lightning with vanilla implementations
- Extended optimizer support with particular frequency
- Several improvements for loggers such as represent no-primitive types, supporting hierarchical dictionaries for hyper param searchers
- Added model configuration checking before it runs
- Simplify the PL examples structure (shallower and more readable)
- Improved Trainer CLI arguments handling (generalization)
- Two Trainer argument become deprecated:
print_nan_gradsandshow_progress_bar
Detail changes
Added
- Added same step loggers' metrics aggregation (#1278)
- Added parity test between a vanilla MNIST model and lightning model (#1284)
- Added parity test between a vanilla RNN model and lightning model (#1351)
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example (#1232)
- Added support for hierarchical
dict(#1152) - Added
TrainsLoggerclass (#1122) - Added type hints to
pytorch_lightning.core(#946) - Added support for
IterableDatasetin validation and testing (#1104) - Added support for non-primitive types in
hparamsforTensorboardLogger(#1130) - Added a check that stops the training when loss or weights contain
NaNorinfvalues. (#1097) - Added support for
IterableDatasetwhenval_check_interval=1.0(default), this will trigger validation at the end of each epoch. (#1283) - Added
summarymethod to Profilers. (#1259) - Added informative errors if user defined dataloader has zero length (#1280)
- Added testing for python 3.8 (#915)
- Added a
training_epoch_endmethod which is the mirror ofvalidation_epoch_end. (#1357) - Added model configuration checking (#1199)
- Added support for optimizer frequencies through
LightningModule.configure_optimizers()(#1269) - Added option to run without an optimizer by returning
Nonefromconfigure_optimizers. (#1279) - Added a warning when the number of data loader workers is small. (#1378)
Changed
- Changed (renamed and refactored)
TensorRunningMean->TensorRunningAccum: running accumulations were generalized. (#1278) - Changed
progress_bar_refresh_ratetrainer flag to disable progress bar when setting to 0. (#1108) - Enhanced
load_from_checkpointto also forward params to the model (#1307) - Updated references to self.forward() to instead use the
__call__interface. (#1211) - Changed default behaviour of
configure_optimizersto use no optimizer rather than Adam. (#1279) - Allow uploading models on W&B (#1339)
- On DP and DDP2 unsqueeze is automated now (#1319)
- Did not always create a DataLoader during reinstantiation, but the same type as before (if a subclass of DataLoader) (#1346)
- Did not interfere with a default sampler (#1318)
- Removed default Adam optimizer (#1317)
- Gave warnings for unimplemented required lightning methods (#1317)
- Made
evaluatemethod private >>Trainer._evaluate(...). (#1260) - Simplify the PL examples structure (shallower and more readable) (#1247)
- Changed min-max GPU memory to be on their own plots (#1358)
- Remove
.itemwhich causes sync issues (#1254) - Changed smoothing in TQDM to decrease variability of time remaining between training/eval (#1194)
- Change default logger to a dedicated one (#1064)
Deprecated
- Deprecated Trainer argument
print_nan_grads(#1097) - Deprecated Trainer argument
show_progress_bar(#1108)
Removed
- Removed duplicated module
pytorch_lightning.utilities.arg_parsefor loading CLI arguments (#1167) - Removed wandb logger's
finalizemethod (#1193) - Dropped
torchvisiondependency in tests and added own MNIST dataset class instead (#986)
Fixed
- Fixed
model_checkpointwhen saving all models (#1359) Trainer.add_argparse_argsclassmethod fixed. Now it adds a type for the arguments (#1147)- Fixed bug related to type cheking of
ReduceLROnPlateaulr schedulers(#1114) - Fixed a bug to ensure lightning checkpoints to be backward compatible (#1132)
- Fixed a bug that created an extra dataloader with active
reload_dataloaders_every_epoch(#1181) - Fixed all warnings and errors in the docs build process (#1191)
- Fixed an issue where
val_percent_check=0would not disable validation (#1251) - Fixed average of incomplete
TensorRunningMean(#1309) - Fixed
WandbLogger.watchwithwandb.init()(#1311) - Fixed an issue with early stopping that would prevent it from monitoring training metrics when validation is disabled / not implemented (#1235)
- Fixed a bug that would cause
trainer.test()to run on the validation set when overloadingvalidation_epoch_endandtest_end(#1353) - Fixed
WandbLogger.watch- use of the watch method without importingwandb(#1311) - Fixed
WandbLoggerto be used with 'ddp' - allow reinits in sub-processes (#1149, #1360) - Made
training_epoch_endbehave likevalidation_epoch_end(#1357) - Fixed
fast_dev_runrunning validation twice (#1365) - Fixed pickle error from quick patch
__code__(#1352) - Fixed memory leak on GPU0 (#1094, #1349)
- Fixed checkpointing interval (#1272)
- Fixed validation and training loops run the partial dataset (#1192)
- Fixed running
on_validation_endonly on main process in DDP (#1125) - Fixed
load_spawn_weightsonly in proc rank 0 (#1385) - Fixes
use_ampissue (#1145) - Fixes using deprecated
use_ampattribute (#1145) - Fixed Tensorboard logger error: lightning_logs directory not exists in multi-node DDP on nodes with rank != 0 (#1375)
- Fixed
Unimplemented backend XLAerror on TPU (#1387)
Contributors
@alexeykarnachev, @amoudgl, @areshytko, @asafmanor, @awaelchli, @bkkaggle, @bmartinn, @Borda, @borisdayma, @cmpute, @djbyrne, @ethanwharris, @gerardrbentley, @jbschiratti, @jeremyjordan, @justusschock, @monney, @mpariente, @pertschuk, @rmrao, @S-aiueo32, @shubhamagarwal92, @SkafteNicki, @sneiman, @tullie, @vanpelt, @williamFalcon, @xingzhaolee
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor deprecation fix
Monir bug fix with print issues and data_loader (#1080)
TPU support & profiling
Overview
This is the first joint release between pytorch-bearer and Lightning, here we come ...
This release adds support for training models on Tensor Processing Units (TPU). We can now train models on GPUs and TPUs by changing a single parameter in Trainer (see docs). We are also bringing the flexibility of Bearer into Lightning by allowing for arbitrary user-defined callbacks, see docs.
We are also including a profiler that allows Lightning users to identify training bottlenecks (see docs).
This release also includes automatic sampler setup depending on the selected backend, Lightning configures the sampler correctly (no need for user input).
The loggers have also been extended to support for multiple concurrent loggers to be passed to Trainer as an iterable, docs and added support for step-based learning rate scheduling.
At last, lots of bug fixes (see below).
Detail changes
Added
- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (#926)
- Added
reload_dataloaders_every_epoch=Falseflag for trainer. Some users require reloading data every epoch (#926) - Added
progress_bar_refresh_rate=50flag for trainer. The refresh rate on notebooks (#926) - Updated governance docs
- Added a check to ensure that the metric used for early stopping exists before training commences (#542)
- Added
optimizer_idxargument tobackwardhook (#733) - Added
entityargument toWandbLoggerto be passed towandb.init(#783) - Added a tool for profiling training runs (#782)
- Improved flexibility for naming of TensorBoard logs, can now set
versionto astrto just save to that directory, and usename=''to prevent experiment-name directory (#804) - Added option to specify
stepkey when logging metrics (#808) - Added
train_dataloader,val_dataloaderandtest_dataloaderarguments toTrainer.fit(), for alternative data parsing (#759) - Added Tensor Processing Unit (TPU) support (#868)
- Added semantic segmentation example (#751, #876, #881)
- Split callbacks in multiple files (#849)
- Support for user-defined callbacks (#889 and #950)
- Added support for multiple loggers to be passed to
Traineras an iterable (e.g. list, tuple, etc.) (#903) - Added support for step-based learning rate scheduling (#941)
- Added support for logging hparams as
dict(#1029) - Checkpoint and early stopping now work without val. step (#1041)
- Support graceful training cleanup after Keyboard Interrupt (#856, #1019)
- Added type hints for function arguments (#912)
- Added default
argparserforTrainer(#952, #1023) - Added TPU gradient clipping (#963)
- Added max/min number of steps in Trainer (#728)
Changed
- Changed default TQDM to use
tqdm.autofor prettier outputs in IPython notebooks (#752) - Changed
pytorch_lightning.loggingtopytorch_lightning.loggers(#767) - Moved the default
tqdm_dictdefinition from Trainer toLightningModule, so it can be overridden by the user (#749) - Moved functionality of
LightningModule.load_from_metricsintoLightningModule.load_from_checkpoint(#995) - Changed Checkpoint path parameter from
filepathtodirpath(#1016) - Freezed models
hparamsasNamespaceproperty (#1029) - Dropped
loggingconfig in package init (#1015) - Renames model steps (#1051)
training_end>>training_epoch_endvalidation_end>>validation_epoch_endtest_end>>test_epoch_end
- Refactor dataloading, supports infinite dataloader (#955)
- Create single file in
TensorBoardLogger(#777)
Deprecated
- Deprecated
pytorch_lightning.logging(#767) - Deprecated
LightningModule.load_from_metricsin favour ofLightningModule.load_from_checkpoint(#995, #1079) - Deprecated
@data_loaderdecorator (#926) - Deprecated model steps
training_end,validation_endandtest_end(#1051, #1056)
Removed
- Removed dependency on
pandas(#736) - Removed dependency on
torchvision(#797) - Removed dependency on
scikit-learn(#801)
Fixed
- Fixed a bug where early stopping
on_end_epochwould be called inconsistently whencheck_val_every_n_epoch == 0(#743) - Fixed a bug where the model checkpoint didn't write to the same directory as the logger (#771)
- Fixed a bug where the
TensorBoardLoggerclass would create an additional empty log file during fitting (#777) - Fixed a bug where
global_stepwas advanced incorrectly when usingaccumulate_grad_batches > 1(#832) - Fixed a bug when calling
self.logger.experimentwith multiple loggers (#1009) - Fixed a bug when calling
logger.append_tagson aNeptuneLoggerwith a single tag (#1009) - Fixed sending back data from
.spawnby saving and loading the trained model in/out of the process (#1017) - Fixed port collision on DDP (#1010)
- Fixed/tested pass overrides (#918)
- Fixed comet logger to log after train (#892)
- Remove deprecated args to learning rate step function (#890)
Contributors
@airglow, @akshaykvnit, @AljoSt, @AntixK, @awaelchli, @baeseongsu, @bobkemp, @Borda, @calclavia, @Calysto, @djbyrne, @ethanwharris, @fdelrio89, @hadim, @hanbyul-kim, @jeremyjordan, @kuynzereb, @luiscape, @MattPainter01, @neggert, @onkyo14taro, @peteriz, @shoarora, @SkafteNicki, @smallzzy, @srush, @theevann, @tullie, @williamFalcon, @xeTaiz, @xssChauhan, @yukw777
If we forgot someone due to not matching commit email with GitHub account, let us know :]