Modern library systems at universities and research institutes are perfect examples of today's complex Distributed Information Systems (DIS): they are responsible for serving large and diverse technical communities by providing access to an extensive set of equally large and heterogeneous electronic information resources. As the complexity and size of both user communities and information resources grows, the fundamental limitations of traditional information retrieval systems have become evident. The recent LANL library user survey has revealed desired functionality, currently unavailable in today's library (from survey conducted by Rick Luce, group leader of LANL's research library):
The sources of these limitations can be traced directly to a number of technical deficiencies of current DIS, in particular, that they are:
New approaches for information retrieval have been proposed to address these limitations. These active recommendation systems, also known as Active Collaborative Filtering, Knowledge Mining, or Knowledge Self-Organization environments, rely on active computational environments that interact with and adapt to their users . They effectively push relevant information to users according to previous patterns of information retrieval or individual user profiling.
Typical recommendation systems come in two varieties:
Content-based systems depend on single user profiles, and thus cannot effectively recommend documents about previously unrequested content. Conversely, pure collaborative systems, with no content analysis, match only the profiles of users that (to a great extent) have requested the same exact documents; for instance, different book editions are considered distinct documents. It is clear that effective recommendation systems require aspects of both approaches.
We propose developing and researching recommendation systems for LANL's Library Without Walls (LWW). These systems will be both collaborative and content-based, and will exploit currently untapped sources of information in DIS. In particular, they will integrate information from the patterns of usage of groups of users, and also categorize database content or semantics in a manner relevant to those groups. Moreover, we intend that the semantic tags and conceptual categories need not be just designed into these systems, but may also be induced and evolved from document content, user-supplied information, and group interaction.
Our overall aims are to deploy software applications within the LWW, and to use the LWW and its user community itself as an object of scientific study. These efforts will provide substantial benefits to the expanding needs of the library by responding to the specific issues revealed in the recent survey, in particular by:
These overall goals will be pursued in a modest first-year effort to demonstrate fundamental engineering capabilities and scientific results. Initially, an existing prototype of the TalkMine recommendation system will be developed and deployed, and the inherent semantic structure of LWW databases will be analyzed. Later work will see the analysis of customer satisfaction and experimental results, and lay the basis for expansion of these methods in following years.
TalkMine is an adaptive recommendation system which is both content-based and collaborative, and further allows the crossover of information among multiple databases searched by users. In this way, different databases learn new and adapt existing keywords to the categories recognized by its communities of users. TalkMine is based on several theories of uncertainty, such as fuzzy set theory and Dempster-Shafer theory of evidence, as well as on biologically inspired adaptionist ideas.
Luis Rocha (CIC-3) has developed TalkMine as a fully functional prototype for Microsoft Windows computers. The architecture has both user-side and system-side components. Each user owns a browser (or plug-in to an existing Internet browser), which functions as a consolidated interface to all information resources searched. This individual browser stores user preferences and tracks information retrieval patterns and relationships which it utilizes to adapt to the user.
Where existing DIS are strictly unidirectionally query-based, in TalkMine an interactive, conversational, multi-directional approach between user and system side components is fundamental. Each user's browser engages in an interactive algorithm with the information resources it queries. This first results in a list of document and related topic recommendations issued according to the user's profile and present interests, and the integration of knowledge from the several information resources queried. The second result of this interaction is that all sides exchange information, therefore all of the parties can potentially learn new information in an adaptive fashion. Indeed, databases can learn new keywords from users and other databases, and will adapt the associations between keywords and documents according to the expectations of its users.
In this way TalkMine establishes an open-ended human-machine symbiosis, which can be used in the automatic, adaptive, organization of knowledge in DIS such as library databases or the Internet, facilitating the rapid dissemination of relevant information and the discovery of new knowledge.
Specific to LANL's LWW project, TalkMine can achieve the following goals:
The central data structure used in traditional DIS is a many-many mapping among a set of documents and a set of keywords which act as semantic "tags" on the content of the documents. The keywords are usually provided directly by the authors, or at best by secondary editors or librarians. In traditional DIS these keywords form the basis for query matching for information pull. TalkMine also uses keywords on documents, but through its adaptive, conversational approach provides an effective means for communities of users to explore multiple keyword spaces, thereby both pushing this semantic content to users and sharing it among databases.
Beyond deploying TalkMine in its present form, we will also pursue research and development goals to use other sources of information in DIS beyond author-supplied keywords, and to augment the given keywords and document-keyword mappings to capture important information not available in current DIS. These efforts can not only serve future enhancements of TalkMine's capabilities, but will have general applicability to recommendation systems of many types.
There are many diverse sources of information available in DIS to enhance representations of semantic content. In addition to the actual contents of the texts themselves (including both the abstracts of indexing services such as SciSearch and Inspec, and the contents of e-journals) are various sources of structural information about linkages among documents, among keywords, and between documents and keywords. Given such structural mappings, quantitative information is available to induce indirect connections among both documents and keywords. These mappings include:
We will first explore methods to instrument LWW systems in order to gather information of these types. Analysis of this information can then be useful in a number of ways. Methods can be deployed to accomplish the following, either separately or in combination:
Finally, we also wish to examine the role that existing conceptual maps can play in providing enhanced semantic linkages among documents and keywords. Such maps would be provided as external sources of semantic information, and are available from the research community as products of prior Artificial Intelligence and Information Science research efforts. These ontologies, such as WordNet from Princeton (see http://www.cogsci.princeton.edu/~wn), are effectively lexicons or thesauri which have been augmented to include taxonomies of semantic relations among terms. These structures can be brought to bear both to aid in the analysis of the LWW's existing semantic space, and to induce further semantic connections among DIS components.
The primary purposes for the FY99 effort will be to develop initial engineering capabilities and perform scientific analysis of the existing LWW systems and the results of an initial TalkMine deployment. In particular, we foresee the following specific goals, some of which may be pursued in parallel:
At this point it may be desirable to augment the TalkMine application to utilize results of the semantic analysis, for example, to incorporate the hierarchical relations among keywords. For other results, it may be appropriate to design additional user-accessible applications or extensions to existing applications. While these will require separate deployment, they may also serve to enhance TalkMine's effectiveness. For example, these methods may provide new keywords, or a new space of keywords at a new semantic level, in which TalkMine can operate.
Members of the proposed team not only have achieved significant scientific accomplishments in the theory of DIS and recommendation systems, but additionally have extensive experience in the design and development of software systems and sucsessful interaction with clients to realistically satisfy their needs.
Many engineering issues will need to be addressed in this project, some of which may require assistance from Library staff resources. On the server-side, we will need to understand the database protocols used within the LWW corpus, and how we can construct new tables which will record the structural information and implement aspects of the algorithms needed for our systems. For the TalkMine Testbed, we may also need assistance in developing new applications for data gathering and data analysis, to possibly develop new or enhance existing applications for user interaction, and to assist with expert knowledge of the data content and user base for testing and analysis.