Simile: Real World Challenges Drive Research Forward

April 10, 2007 - SIMILE (Semantic Interoperability of Metadata and Information in unLike Environments), a collaborative project between MIT Libraries, David Karger, professor of Computer Science and Electrical Engineering at CSAIL, and Eric Miller, CEO of Zepheira and formerly with the World Wide Web Consortium (W3C) is breaking down limitations in software application interactions, making search functions more inclusive, and personalizing people's interactions with their computers.

The collaboration began in 2003 in response to limitations the MIT Libraries were encountering in using DSpace, a digital repository designed at MIT in 2000 for collecting information born in digital formats. It has since evolved into a much more ambitious undertaking, which the collaborators hope will provide the Libraries with more capable tools for organizing large amounts of information more efficiently and will eventually change how people use the web.

Most information on the web is currently in HTML format. This format makes it easy to search for documents based on the key words they contain, but it makes it difficult for a computer to make more sophisticated connections between information that isn't already linked by a set of key words, but is instead related by important concepts and ideas.

However, research coming out of the Semantic Web community is addressing this problem and will enable computers to make these more sophisticated connections between information sources through ontology, which allows the computer to understand that two words, although different, mean essentially the same thing, such as "CD" and "album".

In addition to creating more meaningful search engines, the SIMILE tools being developed will also allow the Libraries to make these connections automatically between vast stores of information making it easier to catalog. Eventually, these tools will conform to the needs and preferences of the Libraries´ users in how they want the information they are searching for to be presented.

The current HTML format also constrains a user´s options in how they view their information, and how that information interacts with data contained in other formats. Because HTML is locked in to a specific predetermined presentation it makes it hard for a user to reformat it in ways that were not anticipated by the site's designer.

In the same way HTML limits the ways information can be presented the Libraries have also been limited to cataloging their information according to specific predefined criteria such as author, title, year of publication, and name of the journal.

The Exhibit tool, currently in production and available for use as Open Source software, is one way SIMILE is breaking down some of these constraints. It allows a user to gather information about any topic and arrange it according to their personal preferences.

Karger has his own Exhibit of Israeli folk dances, organized according by the choreographer of each dance, the formations the dances are performed in, and the year each was performed. It gives him the ability to organize his data based on how he thinks about the information.

The Libraries take in information from a wide variety of sources that range from physical books and photographs to digital images, movies, and audio files. The Libraries are also unable to anticipate the characteristics of research being done at the Institute so they need an organizational tool that can adapt to their changing needs.

It is precisely these problems that define the Libraries´ role in the collaboration. Mackenzie Smith, Associate Director for Technology at MIT Libraries, sees her role in the project as providing researchers with a connection to real world difficulties that the Libraries and their users are encountering, and keeping the project focused in those directions.

In addition to identifying research topics, Karger also believes the collaboration is an important mechanism in allowing his research to make an immediate impact by serving the Libraries, but also making it available to a broader spectrum of people who are able to access the simile tools through Open Source licensing.

Smith likens this result of the project to a pipeline that provides an outlet for research. "There´s a lot of research done by the students and faculty that never makes it out of the labs. We are taking some of the best research and employing it for our own uses, then we test it so it can be improved for people to use."

Many of these innovations are also being developed by the third member of the SIMILE collaboration, the World Wide Web Consortium. The W3C has been working in concert with the researchers in developing Semantic Web standards that will define this new technology.

Smith believes a project like this could only happen at a place like MIT and she is proud to be a part of it. "SIMILE is pointing toward the future of what we will be able to do and it can only happen at a place like MIT, in terms of the relationship with faculty and their research. There is such an openness to collaboration, and we have finally hit our groove and things are starting to come to fruition."

If you would like to learn more about SIMILE or would like to create your own Exhibit you can visit You can also see an application of the software on this web site in our CSAIL News Exhibit.

-TIG Staff