We know that there can be sticky challenges in sharing Domino data — the fidelity of Notes Rich Text has long been an issue when sharing data outside of the platform. The standard starting point is the Notes Rich Text field renderer/editor and Notes Rich Text Item storage format. They become hard constraints that any solution needs to fit within, a box outside of which we should not think.
At HCL Labs, we take a more radical approach. When new members were recently brought on board, they were instructed to not just think outside the box but to set the box on fire! This may result in approaches that don’t suit everyone. But the mandate we were given for Project Rosetta, an internal proof of concept, was to find an innovative solution for fidelity of formatted content for use outside of Domino with only one must–have: the data must be stored in Domino, in some format. Jason Roy Gary, Digital Solutions CTO, covered the outcome in the DNUG launch event and the recent OpenNTF webinar. You can catch the 10-minute presentation between 40:27-50:44 here or read on for more details.
Terms of Reference
One of the early actions was to be very specific about terminology. For a Domino audience, the phrase “rich text” is inextricably coupled with the Notes editor and a specific storage format. As a result, we have made a conscious effort to refer to handling “formatted text”. I would strongly encourage anyone else discussing this to do the same, to avoid false assumptions.
Two radical options remained on the table throughout: that Notes may need a different editor/renderer for this content and that existing content may require a one-off conversion, if that content wants to leverage the benefits. There were also two key expectations we acknowledged: This was not intended to replace Notes Rich Text and not all existing Notes Rich Text should be converted. If the content is only used in Notes Client or third–party solutions manage the problem already, the status quo is acceptable.
For our starting point, we asked this question: what are the standard editors beyond Domino, and what formats do they use? This seems a simple question, but what became apparent is that the Notes Rich Text editor is used for three specific purposes, each of which have specific paradigms and interoperable formats when considered independently of Domino.
Occasionally in Domino I’ve seen Rich Text Editors and Items used for managing complex, strictly formatted content like policies and procedures. Sometimes the applications around that content has been designed to mimic complex document processing functionality like change tracking as well. Beyond Domino, this content is typically managed in a specific document processing tool, like Microsoft Word and even collaborative editors like HCL Connections Docs, Google Docs or Collabora.
After many years of resistance, the vendors have all moved to a standard format for interoperability, OOXML. The storage format includes some metadata not openly editable in the document processing editors, things like author, tags etc. But that metadata is fixed and limited in scope. The editors do not allow manipulation of the custom metadata. The editors also only permit editing one file at a time.
Despite attempts to “store once, share everywhere,” complications of security and access often result in the content being copied around. The document is sent as a single discreet package with all images, embedded files etc included. This commonly results in a file that is too large for HTTP or even email sending, requiring the user to compress the file or copy to some secure file sharing solution, whether that be products like Connections or temporary transfer protocols like SFTP.
This is a very specific paradigm, matching a Form with virtually no additional metadata and a single Rich Text Item. We acknowledged it as a valid use case, but a narrow one. It would be interesting to investigate whether we can store the content in a format that would allow in-place rendering but also allow editing in one of those external editors, e.g. Microsoft Word. But it was not appropriate to the current scope.
Email has always been a core part of Notes and Domino. At the time Domino was launched, email was not in wide usage. Indeed, MIME was not defined by IETF until 1992. However, MIME has become the de facto standard for interoperability outside of Domino. If you receive an email from outside Domino, it will be stored as MIME. Only emails from Domino domains will be stored as Notes Rich Text.
Again, there is specific but limited metadata. For MIME emails, the Item type that addresses are stored in is not Text, it’s RFC822 Text. That’s a different data type internally within Domino, but one that the Notes Client is (presumably) programmed to interpret in a specific way. That is interesting and informative.
The storage structure is also very particular for this use case. The email is only ever referenced from source by the sender. Everyone else receives their own copy of the email. There has never been an attempt to store a “single version of the truth” which all recipients reference. Consequently, what is circulated is a single package containing text, images and files – as with document processing. Size of associated content may be prohibitive, which is why many emails these days reference external images and may link to files on central services like Connections Files or Box.
This is also a very specific paradigm, not matching the typical usage in applications. Would interoperability of email be easier if Domino stored this content as MIME instead of Notes Rich Text, and if the Notes Client had an alternative editor / renderer to use MIME rather than Notes Rich Text? That’s beyond the scope of our current investigation. But again, this type of content was left out-of-scope.
Formatted Content in Fields in Forms
The third scenario is formatted content in arbitrary fields in forms. The editors are not document processing tools, nor mail clients. There were various editors used, but the output fell into two categories – HTML and markdown.
For editing markdown, there are two types of editors. The first type are editors in IDEs (VS Code) or standalone (Joplin) and these are designed to edit markdown files only. As with the document processing tools, these are out of scope. The second type are markdown editors that can be embedded into an application, like the editor on OpenNTF’s website or Stephan Wissel’s comment area on his blog. These more closely correlate to the kind of editor for a Domino application. They also map closely to editors that provide content as HTML, such as TinyMCE or framework-specific editors like the Vaadin rich text editor.
There are some other significant differences to document processors and email editors. Firstly, the content is not intended to be copied around, only referenced. Secondly, images and file attachments are typically stored separately, which allows them to be cached by whatever client is displaying the content. Indeed, blog platforms like WordPress require these assets to be stored separately and the editors enforce this. Even in Domino, Declan Lynch’s Blogsphere template takes the approach of storing assets separately. Thirdly, there may be multiple formatted text editors on a form. And fourthly, the metadata adjacent to the formatted content – i.e. other fields – is random and rarely do two forms contain the same sets of metadata.
This is the scenario Project Rosetta targeted. As can be seen from this analysis, there are key differences to how Notes Rich Text editors and Items function. But the use case we targeted was exclusively for content that is intended to be shared beyond Domino.
Our proof of concept had a simple goal: investigate feasibility of allowing content to be managed as HTML or markdown, ideally converted between the two, and stored in Domino.
Typically, content is entered in one or the other. Markdown is a nice flexible approach for quick editing with basic semantic formatting, and it opens up some interesting opportunities for consumption. There are also some options available in markdown that are not easily available in HTML like note blocks. But I’m conscious that expecting users from Marketing or HR, for example, to enter content as markdown is not realistic. Converting between markdown and HTML and vice versa would solve this problem and is analogous to the low-code / pro-code round-tripping approach that has been discussed for Domino’s low–code vision.
We built the API layer on top of Project Keep. The challenge we had was, if we were to send this content alongside other fields, how do we determine the data type that should be stored? In Keep and in Domino HTTP, attachments are already uploaded and retrieved as separate REST calls to accessing field data. So, we took the same approach here. The flow, at this point, would be to create the document with one call, then upload attachments or images as a separate call, then uploading or retrieving formatted content with Content-Type as “text/html” or “text/markdown.” Could that change? Of course.
In terms of technology, Keep is Java and in Java the standard library for HTML manipulation is JSoup, the standard library for markdown conversion is flexmark.
The benefit of keeping this manipulation between the two output types on the server is that clients just need to speak HTML or markdown. Different flavours of markdown may prove a challenge in the future, but that’s for the future. For app developers, you bring your own editor, we provide consistent conversion. Handling it on the server also allows a phase for cleaning the HTML. I am particularly keen to ensure we retain this as it opens up a number of potential innovative opportunities which I’m not ready to discuss at this time.
Admittedly, it may be trying to solve a problem no one has. But R&D is about thinking about a problem differently and coming up with a solution not suggested before. It’s about being Icarus, daring to fly higher but willing to fall.
I think we achieved what we set out to do, with a demo that went from mobile, to web, to an idea of what Nomad could handle with appropriate editors on top of the raw HTML / markdown, back to the web and back to mobile. It’s certainly not complete and work on Notes Client / Nomad would be required. And, as I said, it doesn’t cover all scenarios. But it demonstrates getting formatted content into and out of Domino without the peculiarities of Notes Rich Text, using something more universal. That opens up a new way for migrating formatted content from non-Domino databases into an NSF, so appealing to non-traditional audiences. And as Jason showed at the end of the session, thinking beyond Notes Rich Text for formatted content is a requirement for non-traditional clients and transfer formats, like EWS.