Adding meta-information to documents

Suggestions, questions oder problems with regain

Moderator: thtesche

Adding meta-information to documents

Postby mbreiling » Fri Jul 05, 2013 9:09 am

Dear regain developpers,

Our research institute is currently extending regain 2.0.4 to allow the addition of meta-information to each document (e.g. author, publishing year etc. - anything that could be useful for the search - plus a ranking and review/recension system just like Amazon's five star ranking + client reviews; the latter allows to categorize the documents according to their (personal) usefulness and make some small comments that allow other users to judge quickly, whether the found document is suitable to them). We are now offering to re-integrate our changes back to the sourceforge trunk - if you (the developpers) agree.

Long version:

I am senior staff of the Fraunhofer Institute for Integrated Circuits (IIS) in Erlangen/Germany. We are a public research institute that carries out applied research on various topics in the area of microelectronics. In this context, we read and apply many scientific articles for our work, which we store in a directory tree. We chose this type of "data base", as it is most simple to maintain and it can also be carried with us on a notebook computer, when we are on business trips (by contrast to server-based content management systems). However, it is rather difficult to find a scientific document (paper), even if we try to structurize our directory tree as much as possible according to the topics of the scientific articles. That's where we discovered regain for our purposes.

This was around the year 2005. We found however, that a few additional features could be useful. That's the reason why we took your source code for version 1.5.2 and started a branch from there. Among the new features that were implemented, we did the following:

  • introduction of meta-data: every document xyz.YYY can be accompanied by a file xyz.YYY.index. This associated file carries meta-data like key words, author, year of publication etc. in a predefined-format (we can provide an example, if you wish)
  • the meta-data is also taken into account in the search index and hence in the search; therefore, it is possible to search for documents according to the key words, author etc. given to it by the user
  • generation of BibTex-data from the meta-data. BibTex is the bibliography tool of LaTeX, a word processing tool for scientific (mostly mathematical/engineering/physics) publishing. BibTex is used to refer to existing publications stored in a literature database. The entries of the BibTex data base can hence be generated from the new regain meta-data.
Our version is called 1.5.2_IIS, and we can provide a Windows executable for your evaluation, when desired.

We are ready to share the full source code of this IIS version with you, when you want to have a look at it.

We stopped our regain development afterwards (in 2008/09), whereas your trunk was continued. Now we have come to the point where we want to modify our regain branch further. Among others, we want to implement the two following features:

  • include in the meta-data a grading of the document per user as is used, e.g., in Amazon, i.e. each user can give the document 1 up to 5 stars according to its usefulness - this should be taken into account in the sorting order of the search
  • include in the meta-data a user review (once again as in Amazon), such that each user can give some comments about her/his views on the paper. This is not used in the search index, but a new user can quite easily find out whether it makes sense to have a closer look at the document (this is a useful feature for scientific articles, where for a single keyword many articles can exist and the user wants to find out, which is the most useful one for him).
Our plan is now to take your current version 2.0.4 and integrate all our proprietary extensions from 1.5.2_IIS (i.e. the delta to 1.5.2, before we modified the code) into 2.0.4, and then apply our further extensions (ranking/reviews) to this merged version. There are two approaches for us, how we could proceed: Either we keep this new version 2.0.4_IIS internal at our institute, or we resubmit it to regain's sourceforge trunk. In the latter case, we would have to take into account your requirements on the coding style and software design guidelines etc.

We recknoed that it might be beneficial for your regain project to take profit from our work and our new features. That's why we propose a co-operation. Our proposal is that we carry out the merging of our and your branch, implement the new features and then store this as the trunk head to your repository. This way, you would benefit from our work, and we will benefit from any improvements that you might implement later on.

I am aware that this offer is probably considered quite controversial in your developpers community, as you possibly do not want to include some of the features that we want. Therefore, I would like to ask you to discuss our offer internally and judge, whether this might be useful for regain. Please consider, that there might be other institutions (other research institutes, universities, companies) that have user requirements similar to ours, i.e. attach meta-data (like key words, or project name), rank documents or produce BibTex data.

Many thanks in advance and best regards,

Marco Breiling

You can contact me directly under
Code: Select all
by replacing all ";" by "." and the ":" by "@"
Posts: 1
Joined: Fri Jul 05, 2013 8:44 am

Return to regain

Who is online

Users browsing this forum: No registered users and 1 guest