From Achille et al. (2018): "The existing approaches to continual learning can be broadly separated into three categories: data-, architecture- or weights-based. The data-based approaches augment the training data on a new task with the data collected from the previous tasks, allowing for simultaneous multi-task learning on IID data [11, 45, 42, 33, 15]. The architecture-based approaches dynamically augment the network with new task-specific modules, which often share intermediate representations to encourage positive transfer [46, 39, 47]. Both of these types of approaches, however, are inefficient in terms of the memory requirements once the number of tasks becomes large. The weights-based approaches do not require data or model augmentation. Instead, they prevent catastrophic forgetting by slowing down learning in the weights that are deemed to be important for the previously learnt tasks [27, 53, 38]."