Library & Information

AI AND Analytics Automation Course for Library & Information Centre

Course overview

This course is designed to focus on AI, analytics and automation for library and information centres. It balances analytics (descriptive, diagnostic, predictive, prescriptive), ML/NLP methods, workflow automation (RPA, pipeline orchestration, MLOps), visualization and governance — all with library-specific use cases (catalogues, circulation, digital collections, user behaviour, space usage).

Course title

AI, Analytics & Automation for Library and Information Centres

Course description

Practical course teaching analytics and AI methods and how to automate them into library workflows. Topics: data collection and KPIs, exploratory analytics, NLP for collections, predictive models (demand, digitization priorities), recommender systems, automation with pipelines and RPA, dashboards and data products, model deployment and monitoring, ethics and governance. Emphasis on hands-on labs, real datasets and a capstone automation project.

Target audience

Librarians, Data analysts in libraries, Archivists, Information Managers, Library IT staff

Course duration and delivery modes

This course is delivered with options: 2 weeks of intensive physical boot camp training and 3 weeks of online training OR 4 weeks of intensive physical boot camp training.

Learning objectives

By the course end participants will be able to:

Define core library KPIs and design data collection for analytics.
Perform exploratory and diagnostic analytics on library datasets (catalog, circulation, usage, web).
Apply NLP to extract metadata, topics and named entities from collections.
Build predictive models for demand forecasting, churn, and digitization prioritization.
Implement recommender logic and personalization for discovery and outreach.
Automate end-to-end analytics and ML pipelines and deploy models/APIs or RPA bots into library systems.
Monitor model performance, detect drift and apply governance for privacy, fairness and compliance.

Course Content

Orientation & use cases

Why AI & analytics in libraries: decision support, service automation, personalised discovery, resource optimisation.
Datasets overview: catalogue records (MARC/DC), circulation logs, COUNTER/usage stats, web analytics, digitized text.
Course roadmap, final project options and evaluation.

Data foundations for libraries

Data types, quality, cleaning, entity resolution (author matching), schema mapping (MARC → DC).
Data access & governance (APIs, logs, privacy, consent).
Lab: clean and join catalogue + circulation datasets; compute basic KPIs (turnover, loan rates, holdings-use).

Exploratory analytics & visualization

Descriptive analytics, cohort analysis, trend detection, segmentation.
Dashboards principles for libraries; tools: Power BI, Tableau, Superset, or Python (plotly).
Lab: build an interactive dashboard for circulation and collection usage.

Statistical analysis & diagnostic methods

Hypothesis testing, correlation vs causation, A/B testing basics for service changes.
Time series basics for usage and acquisition planning.
Lab: run A/B test analysis or time-series decomposition on usage data.

NLP for collections & metadata enrichment

Text preprocessing, topic modelling (LDA), embeddings, NER, language detection.
Use cases: auto-tagging, subject extraction, collection discovery.
Lab: topic-model or embed a corpus; auto-suggest subject tags for records.

Recommenders & personalization

Content-based, collaborative, session-based and hybrid recommenders; evaluation metrics.
Privacy-preserving personalization and configurable recommender rules.
Lab: build a simple content-similarity recommender (embeddings) and a basic collaborative filter from circulation logs.

Predictive models & prioritization

Regression, classification, churn/patron attrition prediction, demand forecasting, prioritizing digitization.
Feature engineering from time-series and metadata.
Lab: build a model to predict item demand or patron churn; evaluate performance.

Anomaly detection & operations analytics

Detecting abnormal usage, fraud (rapid renewals), integrity issues in ingest pipelines.
Techniques: statistical thresholds, isolation forest, clustering-based detection.
Lab: anomaly detection on access logs or acquisition activity.

Automation workflows, orchestration & RPA

Orchestration fundamentals: pipelines, DAGs, scheduling (Airflow/Prefect), containerization.
RPA uses: batch metadata import, enrichment lookups, automated reporting.
Lab: create an automated pipeline (scheduled ETL → model → update records)

or an RPA flow to enrich metadata

Model deployment, APIs & integration

Model serving (FastAPI), container deployment (Docker), simple scaling options.
Integrating analytics outputs with ILS/Discovery (Koha, Alma, DSpace) and front-ends.
Lab: deploy a model as an API and call it from a mock cataloguing UI or batch update script.

Monitoring, MLOps & lifecycle

Logging, model monitoring (performance & data drift), retraining triggers, reproducibility (DVC, MLflow).
Alerting and dashboards for operations teams.
Lab: set up basic monitoring and an automatic retrain trigger on drift detection.

Ethics, privacy, legal & governance

Patron privacy, GDPR/CIPA considerations, bias in recommendations, transparency and consent.
Documentation: model cards, datasheets, SLAs and human-in-the-loop workflows.
Final project presentations and peer evaluation.

Capstone project

Team project implementing an automated analytics/AI pipeline addressing a library problem (examples below).
Deliverables: project brief, working prototype or dashboard, code/pipelines, model card/datasheet, short demo.

Suggested labs & assignments

Data cleaning: merge MARC-derived metadata with circulation logs and compute usage metrics
Dashboard: build a monthly reporting dashboard for collection managers.
NLP: auto-tagging pipeline to suggest LCSH/subject headings from abstracts.
Prediction: model to forecast demand for new titles or identify items to withdraw.
Recommender: simple “people who borrowed X also borrowed” recommender deployed as an API.
Automation: schedule ETL → analytics → email report or automated catalogue update using an RPA bot.

Tools, platforms & libraries

Data & ML: Python, pandas, scikit-learn, statsmodels, XGBoost, Hugging Face (transformers).
NLP: spaCy, Gensim, sentence-transformers.
Search & vectors: Elasticsearch/OpenSearch, FAISS, Milvus.
Orchestration & MLOps: Apache Airflow / Prefect, Docker, MLflow, DVC, GitHub Actions.
RPA & low-code: UiPath, Microsoft Power Automate, Zapier (concepts & demos).
BI/visualization: Power BI, Tableau, Apache Superset, plotly/dash.
Storage & DBs: PostgreSQL, Neo4j (optional for graph analytics).
Hosting: simple Docker deployments, or cloud examples (AWS/GCP/Azure) if available.

Sample datasets & inputs

Local MARC/Dublin Core records, circulation/loan logs, COUNTER reports and vendor usage, web analytics (Google Analytics), digitized text corpora, anonymized patron event logs.

Implementation advice & best practices

Starting with a high-impact pilot (automated monthly dashboard, OCR→metadata pipeline, or recommender for a specific collection).
Ensuring strong data governance: provenance, consent, retention policies.
Always keep humans in the loop for training and validation (active learning for tagging).
Version data and models; document assumptions via model cards and datasheets.

Measuring impact (time saved, improved circulation, discovery metrics, user satisfaction).