Metrics, speed improvements, new hooks and flags
Pre-release
Pre-release
Overview
Highlights of this release are adding Metric package and new hooks and flags to customize your workflow.
Major features:
- brand new Metrics package with built-in DDP support (by @justusschock and @SkafteNicki)
hparamscan now be anything! (callself.save_hyperparameters()to register anything in the_init_- many speed improvements (how we move data, adjusted some flags & PL now adds 300ms overhead per epoch only!)
- much faster
ddpimplementation. Old one was renamedddp_spawn - better support for Hydra
- added the overfit_batches flag and corrected some bugs with the
limit_[train,val,test]_batchesflag - added conda support
- tons of bug fixes 😉
Detail changes
Added
- Added
overfit_batches,limit_{val|test}_batchesflags (overfit now uses training set for all three) (#2213) - Added metrics
- Added type hints in
Trainer.fit()andTrainer.test()to reflect that also a list of dataloaders can be passed in (#1723) - Allow dataloaders without sampler field present (#1907)
- Added option
save_lastto save the model at the end of every epoch inModelCheckpoint(#1908) - Early stopping checks
on_validation_end(#1458) - Attribute
best_model_pathtoModelCheckpointfor storing and later retrieving the path to the best saved model file (#1799) - Speed up single-core TPU training by loading data using
ParallelLoader(#2033) - Added a model hook
transfer_batch_to_devicethat enables moving custom data structures to the target device (#1756) - Added black formatter for the code with code-checker on pull (#1610)
- Added back the slow spawn ddp implementation as
ddp_spawn(#2115) - Added loading checkpoints from URLs (#1667)
- Added a callback method
on_keyboard_interruptfor handling KeyboardInterrupt events during training (#2134) - Added a decorator
auto_move_datathat moves data to the correct device when using the LightningModule for inference (#1905) - Added
ckpt_pathoption toLightningModule.test(...)to load particular checkpoint (#2190) - Added
setupandteardownhooks for model (#2229)
Changed
- Allow user to select individual TPU core to train on (#1729)
- Removed non-finite values from loss in
LRFinder(#1862) - Allow passing model hyperparameters as complete kwarg list (#1896)
- Renamed
ModelCheckpoint's attributesbesttobest_model_scoreandkth_best_modeltokth_best_model_path(#1799) - Re-Enable Logger's
ImportErrors (#1938) - Changed the default value of the Trainer argument
weights_summaryfromfulltotop(#2029) - Raise an error when lightning replaces an existing sampler (#2020)
- Enabled prepare_data from correct processes - clarify local vs global rank (#2166)
- Remove explicit flush from tensorboard logger (#2126)
- Changed epoch indexing from 1 instead of 0 (#2206)
Deprecated
- Deprecated flags: (#2213)
overfit_pctin favour ofoverfit_batchesval_percent_checkin favour oflimit_val_batchestest_percent_checkin favour oflimit_test_batches
- Deprecated
ModelCheckpoint's attributesbestandkth_best_model(#1799) - Dropped official support/testing for older PyTorch versions <1.3 (#1917)
Removed
- Removed unintended Trainer argument
progress_bar_callback, the callback should be passed in byTrainer(callbacks=[...])instead (#1855) - Removed obsolete
self._devicein Trainer (#1849) - Removed deprecated API (#2073)
- Packages:
pytorch_lightning.pt_overrides,pytorch_lightning.root_module - Modules:
pytorch_lightning.logging.comet_logger,pytorch_lightning.logging.mlflow_logger,pytorch_lightning.logging.test_tube_logger,pytorch_lightning.overrides.override_data_parallel,pytorch_lightning.core.model_saving,pytorch_lightning.core.root_module - Trainer arguments:
add_row_log_interval,default_save_path,gradient_clip,nb_gpu_nodes,max_nb_epochs,min_nb_epochs,nb_sanity_val_steps - Trainer attributes:
nb_gpu_nodes,num_gpu_nodes,gradient_clip,max_nb_epochs,min_nb_epochs,nb_sanity_val_steps,default_save_path,tng_tqdm_dic
- Packages:
Fixed
- Run graceful training teardown on interpreter exit (#1631)
- Fixed user warning when apex was used together with learning rate schedulers (#1873)
- Fixed multiple calls of
EarlyStoppingcallback (#1863) - Fixed an issue with
Trainer.from_argparse_argswhen passing in unknown Trainer args (#1932) - Fixed bug related to logger not being reset correctly for model after tuner algorithms (#1933)
- Fixed root node resolution for SLURM cluster with dash in hostname (#1954)
- Fixed
LearningRateLoggerin multi-scheduler setting (#1944) - Fixed test configuration check and testing (#1804)
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelt arguments (#1820)
- Fixed
save_weights_onlyin ModelCheckpoint (#1780) - Allow use of same
WandbLoggerinstance for multiple training loops (#2055) - Fixed an issue with
_auto_collect_argumentscollecting local variables that are not constructor arguments and not working for signatures that have the instance not namedself(#2048) - Fixed mistake in parameters' grad norm tracking (#2012)
- Fixed CPU and hanging GPU crash (#2118)
- Fixed an issue with the model summary and
example_input_arraydepending on a specific ordering of the submodules in a LightningModule (#1773) - Fixed Tpu logging (#2230)
- Fixed Pid port + duplicate
rank_zerologging (#2140, #2231)
Contributors
@awaelchli, @baldassarreFe, @Borda, @borisdayma, @cuent, @devashishshankar, @ivannz, @j-dsouza, @justusschock, @kepler, @kumuji, @lezwon, @lgvaz, @LoicGrobol, @mateuszpieniak, @maximsch2, @moi90, @rohitgr7, @SkafteNicki, @tullie, @williamFalcon, @yukw777, @ZhaofengWu
If we forgot someone due to not matching commit email with GitHub account, let us know :]