Algorithmic Indexing of Difficult Data

This developing research project aims to integrate open source text processing tools to assist with the analysis, navigation, and reading of large heterogenous data sets. Built around existing open source software tools this project is as much about gluing together existing tools, as it is the creation of new tools. Inevitably, now, this includes lots of AI and the integration of AI in to software tool chains.

Project Goals

The overarching aim is that there should be minimal manual editing of data, therefore improvements to the analysis and navigability should be realised through improvements in the algorithms, not manual tweaks of documents and their meta-data. Ultimately the aim is to produce tools to enhance the readability of difficult databases of text documents and other data.

Technical Approach

The project will use a range of artificial intelligence, machine learning and natural language processing techniques to extract themes, topics, relationships, and keywords from data. These will then be used to build enhanced search functionality. Machine learning will also be used to build multi-layered networks showing different ways the documents in the data sets can be linked together. Allowing interested parties to navigate through the data. Algorithms will take the individual documents, generate metadata, process the documents, and insert them into both a document database and a network database for visualisation and analysis.

Related Resources

Some of the projects here will also appear on my personal website and are also in publications which can be seen in the sidebar. My GitHub and wiki will also capture some of the software.


Navigate

Data Sets

Publication support materials

  • coming soon