Citations for professionals: An interview with Emilano Heyns

When we launched the first public version of Fidus Writer in May 2013, we received a flood of bug reports. One of those reporting bugs was Emiliano Heyns, at the time a philosophy student and open source hacker, with a special interest in bibliographies and citations. Two initial reports about lacking categories in bibliographies quickly grew into a long discussions about the tool chains available for people involved with non-techie surroundings who have the expectations of professionals.

When we launched Fidus Writer 3.0 earlier this year, Heyns again filed a number of bug reports, again in the area of citations, but this time he was also working on a citation tool himself: the “Better Bib(La)TeX for Zotero/Juris-M” project. We quickly found out that we could use a common system for parsing BibTeX and BibLaTeX source files, and in the Fidus Writer 3.1 version we have built in this new parser, which makes use of a lot of extra functionality that Emiliano has come up with over the years. We have asked Emiliano if he would mind sharing a little about himself and his project.

What is the difference between BibTeX and BibLaTeX and which is better?

Oof, that’s a hard one. The honest answer to the question in its full breadth is “I don’t know, exactly”. I’ll get back to that a few questions down. But I do know some key areas. Long story short: BibLaTeX is BibTeX done right, move to BibLaTeX if you can. No slight against the BibTeX authors, they’ve done a great thing, but the world has moved on, for good reasons.

The main thing that most people using BibLaTeX will know is unicode support. BibLaTeX has it, BibTeX doesn’t (there are ways to make it somewhat do unicode, but you really don’t want to go there), and in this day and age, unicode is simply a necessity.

A second aspect where I happen to know that BibLaTeX and BibTeX differ is name handling for creators (authors and such). It seems like BibTeX was created with quite a few assumptions about how names are structured, and those assumptions seem to strongly tied to “you come from an English-speaking country”. There is an astounding variety in how names should be properly quoted, and this is one area where BibLaTeX makes things manageable. You can sort of get BibTeX to do the right thing for non-English names, but you are at that point stacking hacks upon hacks, and the chances that your document will compile cleanly get worse all the time.

Why would people still use BibTeX? Some perhaps because they have older papers that they still cherish, but certainly because a number of publishers are stuck in the stone age and demand BibTeX.

What is the Better BibLaTeX project?

The “Better BibLaTeX” project is an add-on for the Zotero reference manager. Zotero is an absolutely fantastic piece of open source software, and if you do academic writing (or even high school reports — my son uses it too) you should absolutely be using it. Reference management is not something you should be doing by hand.

Zotero however aims squarely at the WYSIWYG user  — specifically, they assume that if you are a typical user, you use Word or OpenOffice for your publishing needs. If you don’t (and I don’t) and instead use LaTeX (or Markdown) instead, you’ll soon find that Zotero falls just short and you need to do a lot of manual work to get things right. Since the point of a reference manager is to prevent manual work on your references, this needed a fix, and since I’m a software engineer, I’d rather spend time automating a fix than fixing by hand. So Better BibTeX, Or BBT as I now often call it, was born. I think the main annoyances I wanted to get out of the way when I started was citation key management — Zotero doesn’t have any — and then a way to get my references out automatically so I don’t have to remember to export to BibLaTeX again when I added or changed references.

Things went a little wild from there. I added a lot of bells and whistles, some of which I do regret somewhat, but for the most part new features came about to improve the quality of the bib(la)tex export. The aim was always loosely that BBT would export bib(la)tex which would compile to a bibliography that resembles the Zotero/citeproc generated bibliography as close as possible. That’s not holy writ, and I probably do deviate here and there, but it’s a guideline I keep in mind. One thing that I’ve found that expressing this intent is very far from simple in bib(la)tex, which brings me to…

The Zotero Citation manager allows for professional citation management and has plugins for LibreOffice and Microsoft Word. It functions on WIndows, Mac OS X and Linux.

Why “better” BibLaTeX? BibTeX has been around since 1985, BibLaTeX since 2006. We are talking 10 and 30 years. What can there possibly be left to invent about citations? What kinds of new features are you working on?
Heavens no. I’m not going to improve on even BibTeX — I know woefully little about citations or about bib(la)tex. BBT was originally called “Zotero Better BibTeX”, but at some point I got a sense that the Zotero authors didn’t like things being called “Zotero X” as it would imply that it was being built under their direction. I don’t even know whether that really held but I didn’t want to cause any distress, so I started calling it “Better BibTeX for Zotero”, which just didn’t roll off the tongue very smoothly, so it became simply “Better BibTeX” at some point.

That all said, I feel fully comfortable claiming that BBT is heaps better than the bib(la)tex export bundled with Zotero. It’s not just the key handling (which is a real problem with the default bib(la)tex exporter from Zotero, or the automatic export — the quality of the BibTeX that BBT exports is quite simply better structurally, is better at keeping the aforementioned intent in place, and has better coverage for unicode translation.

That’s not really a slight against Zotero — as I said, their main aim is the WYSIWYG writer, and other reference managers that aren’t BibTeX-native certainly don’t do any better. Among Zotero’s direct competitors — the likes of Mendeley, Qiqqa, ReadCube, Papers, EndNote, Citavi, colwiz — I feel fairly confident that Zotero+BBT is the gold standard if you need BibTeX. That’s not to say that these are not good reference managers on their own (except EndNote. EndNote is horrible), but none of them have good BibTeX management, and of the whole lot, Zotero is the only one that makes it possible to fix that.

The citation manager in Fidus Writer 3.1 allows for WYSIWYG editing of citation data in a way that many other citation managers still lack, thanks also to the contributions of Emiliano Heyns.

One of the features that Fidus Writer 3.1 gains is support for case protection in English titles and support for some styling in title fields.

This comes largely as a result of your reports and the common parsing system that you and I have been working  on. Could you say a bit about what this title protection and styling is useful for? This is something you also provide in the BBT project, right?

It’s something that is supported in BBT, correct. The one is easy to explain, the other a little harder.

Styling pertains to markup in names, titles, etc. Citeproc (and thus Zotero, and thus BBT) supports limited HTML-like markup in references:

  1. <i>/<em> for italics
  2. <b>/<strong> for bold
  3. <sub><sup> for subscript/superscript
  4. <span class=”nocase”>/<nc> to prevent case-meddling (on which more below)
  5. <span style=”font-variant: small-caps”>/<sc> to force the text to smallcaps

I mostly see <sup>/<sub> being used for chemical elements, but smallcaps is sometimes useful, and the nocase… again, more below. It’s not pretty, but it works, and reference managers could in principle add WYSIWYG editing to these fields to hide the markup, but Zotero doesn’t. BBT translates these to their corresponding LaTeX markup on export, and will on import convert such LaTeX markup to the corresponding HTML-ish.

Now, case-meddling. This was a topic that I thought I knew about until I got schooled on the subject. It turns out that bib(la)tex expects reference titles to be in Title Case, so your favorite Disney movie should be entered as “Snow White and the Seven Dwarfs”. There are citation styles styles which demand that titles are rendered in sentence case, and bib(la)tex will automatically make sure that title renders to “Snow white and the seven dwarfs” if the style demands this; mostly, the rule is just “Keep Titles in Title Case and Things Will be Fine”.

Except things are not fine when you have “Snow White and the IEEE dwarfs”, which will render to “”Snow White and the Ieee Dwarfs” even if the style doesn’t sentence-case. To prevent such terms from being messed with, you can wrap them in <span class=”nocase”>IEEE</span>/<span class=”nocase”>dwarfs</span> and Zotero and BBT will make sure the citation processor (BibTeX/BibLaTeX in our case, citeproc in the case of Zotero) doesn’t mess with those bits.

This is made more complicated by the fact that citeproc (and thus Zotero) expects titles to be provided in sentence case, so BBT changes the title to Title Case (at least those parts you haven’t marked as off-limits with nocase) before handing it to BibLaTeX, which may turn it back into sentence case, depending on the style. It’s all wonderfully complicated.

And to add more spice to the mix, this all goes only for English(ish) titles; ‘en’, ‘american’, ‘british’, ‘canadian’, ‘english’, ‘australian’, ‘newzealand’, ‘USenglish’, ‘UKenglish’ all count as English for citation purposes. If you set no language on a reference, it is also considered English, and so subject to the case meddling rules. Anything else, and BBT nor Zotero will touch your input on generating the titles.

The case-meddling is easily one of the more complicated parts of BBT, together with name handling (which can be so much fun) and and date parsing.

Can you give us an idea of who the actors in this field are? I hear names like Nick Bart, Frank Bennett and PLK mentioned a log. Who are these people, and why are they so central for the citation world in 2016? Who else should one know?

I need to make clear that I have a pretty myopic view of the field. I’m primarily a fairly unsophisticated BibLaTeX user, but I’m a fairly decent software engineer, and these people are the ones that have in the past year really pushed BBT forward. Still, I would be remiss if I did not mention all those BBT users who chimed in with issue reports and questions — for sure they were the ones who in the early days drove most of the development. But these days, BBT is fairly stable, and we’re now getting to the more gnarly edge-cases of citation, and these people are here to see me through those.

Nick Bart knows a crazy lot about bib(la)tex. He knows all the edge cases, and if I find some weird anomaly he will usually just know the reason and the fix. I lean on Nick Bart pretty heavily these days when a new request rolls (if it’s not just a bug report) to find out what BBT should be doing at all before I start coding. This has become more and more important as I’m trying to cut down on the number of knobs and dials you can tweak to get BBT to do what you want. To do this, I need to know that the one behavior I’m fixing in place by not adding a preference to fiddle with is the universally correct behavior. Nick Bart knows this, I don’t.

PLK (Philip Kime) is the lead of the BibLaTeX project. It does happen that in the course of building BBT we do stumble upon a genuine bug in BibLaTeX or an ambiguity in how BibLaTeX behaves. This is way over my head, so this is usually a conversation between Nick and Philip and I just wait patiently until they come to a resolution.

Then finally of these three, Frank. Frank is the author of the Juris-M fork of Zotero, and BBT tries to be compatible with both. BBT exercises a narrow slice of Zotero/Juris-M fairly thoroughly so it happens regularly that BBT finds bugs in Juris-M when I run my test suite. But mostly I talk to Frank about citeproc, the citation processor that is the heart of (a.o.) Zotero. Not only do I try to keep in line with the output it generates, I re-use parts of citeproc in BBT — things like name parsing (which is a lot more work than I thought initially) and date parsing (which is also a lot more work than I ever thought). Not every reference in Zotero is squeaky clean, and some of these tools help me to clean them up before I export them. I’m probably forgetting some important other parts where I use parts of citeproc, but broadly, without those parts of citeproc, BBT would be substantially worse.

You mention on your page that your tool is for “holdouts” of BibTeX/BibLaTeX. That sounds as if your technology is being replaced by something else. What is it being replaced with?

Everything and nothing. We’re in a weird place right now with regards to academic publishing. The main tools that people will bring up are Google Docs, Word and LibreOffice. But a good replacement? I haven’t seen any. All of them bring something to the table that I want, but none of them bring all of them.

Google Docs for example is fantastic for collaboration. It’s real-time, there’s a single master copy, no what-version-did-you-have-then problems. And it’s easy to use for pretty much everyone, which is not exactly a claim that BibLaTeX and the tool chain that comes with it can claim. But reference management is sub-standard (I’ve tried to love Paperpile. After 4 serious attempts, it’s not happening), there’s no abbreviation management, no figure management… for academic publishing, it’s good to get a draft going, but absolutely nothing more.

Word is accepted everywhere, and the integration with Zotero (or Mendeley, or…) is fantastic. And it does great track-changes views (oh so important for supervisors), and with Onedrive, you even have real-time editing. But abbreviation/figure management is still icky, and even when the layout doesn’t screw up irreparably after I changed something that should be wholly unrelated, I think the resulting document is just ugly. An eyesore. I love the output LaTeX gives me, and the way I don’t have to worry about consistency in the resulting document.

LaTeX does all the things I care about deeply (like abbrevation management, cross-referencing with page numbers, figure/theorem management, etc), but is just really, really bad at some things that Word users take for granted. Sidebar comments? A track-changes view? Yes, there are packages and scripts that will do those, but they take an ungodly amount of fiddling, half of the time the resulting doc won’t even compile, and when it does, something is always broken (typically: references). Real-time collaboration with people that aren’t programmers. There’s sharelatex for real-time collaboration, but that’d mean my co-authors would have to accept LaTeX, and that ain’t happening. Compiler errors on a tight deadline. This is the “holdout” part; I know about all these problems, and I keep using LaTeX. I’ve tried Word a few times for simple reports, and I always gave up after a day or two. The book chapter I just co-authored I tried doing in Google Docs and after a few weeks I just gave up in disgust and agreed to do all the authoring just so I could use pandoc.

What I’d love to see is an online (because who wants to have to install stuff these days), real-time, multi-author editor, that would have a neutered view for my WYSIWYG brethren, a markup view for me (LaTeX or something else, as long as I get the stuff I care about), a vim mode preferably but at least something that syncs to offline files (don’t trust the cloud as the only place for your precious articles). LyX would be halfway there if the file format wasn’t so strange, and co-authoring (or even version management, really) is a non-starter.

I had some hope for scholarly markdown, and I do use Pandoc for simple two-pagers, but for longer stuff there’s too much missing, so I fall back to LaTeX. Fidus Writer falls into this same pandoc category for me — good for simple things, but for more complex articles, I’ll use LaTeX. Plus, sync. I frequently travel outside network coverage, so I do a lot of work with offline+sync.

Do you see a future for tools like Zotero and BibTeX/BibLaTeX? What would need to happen for the majority to shift their focus toward those technologies again?

I don’t know. The geek in me would love to see it happen, but I’m not recommending my son or daughter to spend the time to learn it. It’s one of those things that if you already know them, they’re hands-down the best tools available, but if you don’t, the difference in the results between Word or LaTeX may be too marginal to invest the time.

Perhaps if someone could make a tool that would keep most of the features of LaTeX without it being a full programming language. LaTeX being a programming language makes it a really bad fit for something that needs to do WYSIWYG. But perhaps with a rich enough semantic format… but then you’d probably get docbook, and how many people use that?

You used to be a philosophy student, I assume in the Netherlands. What are you doing nowadays?

I’m not wholly out of the game yet — I attend conferences and workshops when I can, and I have a book chapter due to be published in May. I do hope to proceed towards a PhD, but that needs careful planning to put things in place, and I promised my family I’d take a break for 6 months as the 4-year master was quite hectic alongside my job and family life. I did get one day a week off to pursue my masters, but as I took my masters quite seriously (some would say way too seriously), there was very little time left for leisure, so I owe them at least this. But I loved doing philosophy, and I can’t process getting a degree would just mean the end of things. There’s a ton of subjects that I’d love to get into — metaphysics, meta-ethics, philosophy of science, mind, religion, epistemology… perhaps pick up the project on philosophy of expertise (which combined a lot of those subjects) that I had planned to graduate on before it ballooned out of proportion again and again (because it combined a lot of those subjects).

Yes, I’m in the Netherlands, and after 4 years on the university grounds, I am — for the moment — back to my full-time job as a software engineer/researcher. I’ve recently moved from the IT department at the HAN university of applied sciences in Arnhem to one of the research groups. Our research group is young and we’re still figuring out what research is exactly at a university of applied sciences, so currently most of the work focuses on software engineering. But research is on the agenda, and we’re looking at ways to make the research into a viable and interesting PhD to pursue.

Sounds like a lot of stuff. Good luck with that and thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *