KERTAS: dataset for automatic dating of ancient Arabic manuscripts
The age of a historical manuscript can be an invaluable source of information for paleographers and historians. The process of automatic manuscript age detection has inherent complexities, which are compounded by the lack of suitable datasets for algorithm testing. This paper presents a dataset of historical handwritten Arabic manuscripts designed specifically to test state-of-the-art authorship and age detection algorithms. Qatar National Library has been the main source of manuscripts for this dataset while the remaining manuscripts are open source. The dataset consists of over 2000 images taken from various handwritten Arabic manuscripts spanning fourteen centuries. In addition, a sparse representation-based approach for dating historical Arabic manuscript is also proposed. There is lack of existing datasets that provide reliable writing date and author identity as metadata. KERTAS is a new dataset of historical documents that can help researchers, historians and paleographers to automatically date Arabic manuscripts more accurately and efficiently.