Intelligent Textbooks Description Many adaptive educational systems and artificial intelligence applications depend on high-quality knowledge representations. However, the main obstacle to the widespread use of knowledge-based systems is still the difficulty of acquiring this knowledge. Creating knowledge models manually is time-consuming and labor-intensive. A scalable solution is to use digital textbooks, which are rich in domain-specific content and well-structured. Digital textbooks offer a rich source for creating knowledge representations due to their structured content and availability. The embedded knowledge within textbooks' elements, such as the table of contents (ToC), index, and formatting styles, can be harnessed to generate comprehensive knowledge models. Nonetheless, extracting this hidden knowledge from digital (PDF) textbooks presents notable challenges. Furthermore, even after extracting knowledge models using the ToC, index, and other textbook components, they are not guaranteed to provide high-quality representations of a domain. These models can suffer from several drawbacks: subjectivity, coverage gaps, inconsistent granularity, and a lack of semantic depth. To address these issues, glossaries extracted from individual textbooks can be integrated and linked to external models, such as DBpedia. This project developed a unified approach to automatically extract high-quality, domain-specific knowledge models from digital textbooks, addressing the mentioned challenges. The approach includes enriching the content with additional internal and external links, transforming textbooks into hypertext documents where individual pages are annotated with important concepts in the domain, thus providing a pathway to scalable knowledge extraction.

PhD Thesis

  • Alpizar-Chacon, Isaac. "Extraction of knowledge models from textbooks." Utrecht University. 2023.
  • https://doi.org/10.33540/1647

    Publications

    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky. "Order out of chaos: Construction of knowledge models from pdf textbooks." Proceedings of the ACM Symposium on Document Engineering 2020. 2020.
    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky. "Expanding the web of knowledge: one textbook at a time." Proceedings of the 30th ACM Conference on Hypertext and Social Media. 2019.
    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky. "Knowledge models from PDF textbooks." New Review of Hypermedia and Multimedia (2021): 1-49. 2021
    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky. "What’s in an Index: Extracting Domain-specific Knowledge Graphs from Textbooks" Proceedings of the ACM Web Conference 2022: 966--976. 2022
    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky. "Interlingua: Linking textbooks across different languages." Proceedings of the First Workshop on Intelligent Textbooks. CEUR-WS. 2019.
    • Alpizar-Chacon, Isaac, et al. "Transformation of PDF Textbooks into Intelligent Educational Resources."Proceedings of the Second Workshop on Intelligent Textbooks. CEUR-WS. 2020.
    • Dresscher, Lucas, and Alpizar-Chacon, Isaac, and Sergey Sosnovsky."Generation of Assessment Questions from Textbooks Enriched with Knowledge Models."Proceedings of the Third Workshop on Intelligent Textbooks. CEUR-WS. 2021.
    • Alpizar-Chacon, Isaac, and Sergey Sosnovsky."Integrating Textbooks with Smart Interactive Content for Learning Programming."Proceedings of the Third Workshop on Intelligent Textbooks. CEUR-WS. 2021.
    • Pozzi, Lorenzo, Isaac Alpizar-Chacon, and Sergey Sosnovsky."Harnessing Textbooks for High-Quality Labeled Data: An Approach to Automatic Keyword Extraction."Proceedings of the Fifth Workshop on Intelligent Textbooks. CEUR-WS. 2023.
    • Alpizar-Chacon, Isaac, Sergey Sosnovsky, and Peter Brusilovsky."Measuring the quality of domain models extracted from textbooks with learning curves analysis."Proceedings of the International Conference on Artificial Intelligence in Education.. 2023.

    Demos

    • Transformation of PDF Textbooks into Interactive Educational Resources video

    Code