Importing note fields and subentry fields into FLEx

Importing note fields and subentry fields into FLEx

Note: This page is very specific; please see Preparing Legacy data for Flex for a more complete overview of SFM import.

Importing note fields into FLEx can be tricky, since a note might occur under lx, se, sn, or even rf, in standard MDF. (Some people have used it even more specifically than that.) FLEx only allows a given field to be mapped to a field under one object during import. That's pretty reasonable, except that it considers “Entry” and “Subentry” to be distinct for the purposes of this mapping. This means, for example, that if an entry has three se subentries, each of them with an lc field, the contents of those will be concatenated and dumped into the root entry's Citation Form field. See LT-10905:

The following fields have been tested and found to work under both lx and se: ph, et, eg, es, ec, ps, sn. Thanks to LT-10727, va now works too.

The main ones that must become multiple markers are lc, cf, lt, all custom fields (which are also needed for standard MDF fields like bw, st), and note fields, esp. nt.

The fields that work well, such as ps, sn, et, va, etc. all appear to have special logic that helps them attach properly to either the entry or subentry. For all other fields, the current workaround is to use multiple fields (bw → bwlx bwse, lt → ltlx ltse, nt → ntlx ntse ntsn ntrf, etc.) You can do this semi-manually with regular expressions plus checking in Solid, or else use a CC table or a script. But ideally, this limitation in the FLEx importer would be fixed. In case it is not, there is also a request to add a quick fix to Solid as a workaround (438).

Here's a question recently asked on the LTS list, discussing the worst-case scenario (the note field).

Q: Before I go and manually change all of the \nt markers, can you assure me that when I do the import I can specify the correct location for each? 

A: The short answer is Yes, as long as you only need to attach the notes to these target objects, or a subset of them. (I'm basing this list on SFM import behavior for subentries, and on the Locations available when creating a custom field.) For each object, I've listed what field to map to:
Sense → General Note
Entry, Subentry → Note (i.e. two SFM fields mapped to the same Note field, but one under “Entry” and one under “Subentry”)
Example → custom field
Allomorph → custom field

If you needed all of these, you might create separate \ntsn \ntlx \ntse \ntrf \nta fields. Other target objects are “possible”, but you'd then have to create a custom writing system (labeled as “note”), and I'd probably steer away from that.

For more gory technical details and pitfalls related to import, please see this page. There are a lot of issues to consider (only some of which will generate any warnings), and most of those gory details won't be useful to you after the migration is done, which is why we don't recommend that non-specialists attempt importing SFM into FLEx.

Ok, to do this without too much tedium, I usually look for the most common patterns and use regular expressions for them, like Jeff mentioned. After that, any left-over \sn fields can be manually fixed, and then you can optionally replace all \ntsn back to \nt, if you want to leverage FLEx's default interpretation (General Note) of \nt. (This default interpretation is used the very first time you start to import an SFM file, generating a .map file. The defaults for any MDF fields that are *not* found in the file at that point are permanently discarded.) Either Notepad++ or Eclipse work great for this (Eclipse is better for really advanced regex). For example, in Notepad++, the following will replace nt with ntsn whenever it immediately follows a Definition or Gloss in English or National:\


Complex forms (subentries) are a special case because they're structurally just entries in FLEx (ones which happen to have a Components link and Complex Form Type), whereas MDF has \se (subentry) as a subordinate of \lx in its hierarchy; i.e. \se is between \lx and the sense fields. So, the import wizard provides distinct “Entry” and “Subentry” targets. They are nearly identical, but this distinction allows/requires you to create a distinct \ntse and map it to the same field that you mapped \ntlx to, but under the Subentry object instead of the Entry object. If you were to simply map both kinds of note under the Entry, then every subentry's note will end up on its root's entry instead, which in FLEx will be a totally separate record.

FLEx import does not require all entry/subentry fields in the SFM to be distinguished in this way. Thus, you don't need a separate \psse nor a separate \etse . I've tried to document this in JIRA; here's an excerpt:

“The main ones that don't work are lc, cf, lt, all custom fields (which are also needed for standard MDF fields like bw, st), and note fields, esp. nt.”

( For import, it's not clear which fields need a distinct marker when they are part of a subentry
https://jira.sil.org/browse/LT-10905 )

Note: All field markers beginning with “lc” are apparently viewed as identical due to a FLEx import bug (LT-13811). So, don't invent \lcse; inventing \lse works fine. (Likewise, there also appears to be hard-coded behavior applied to any ps field, regardless of what it is mapped to. See comments under LT-10739 .)

Now, it sounds like you're only concerned about \nt in your case. To find out which items you can attach it to, try adding a custom field in FLEx. Again, I think these are your options: Entry, Sense, Example, Allomorph.

BTW, I've not done much testing with send/receive and its notes/questions, but that may be a good way to handle (sub)entry-level notes (and sense-level perhaps?) in the future. However, I doubt that there will be a good way of importing from SFM into those (and I would vote for prioritizing many other import issues before considering that one).

Contributors to this page: dhigby .
Page last modified on Friday July 3, 2015 20:23:36 GMT-0000 by dhigby.


Creative Commons License
All content on this LingTranSoft wiki are by SIL International are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.