AddPinyin Plugin for MarcEdit

The East Asian Library and the Gest Collection

普林斯頓大學葛思德東亞圖書館 ・ プリンストン大学東アジア図書館 ・ 프린스턴 대학교 동아시아 도서관

AddPinyin Plugin for MarcEdit

Click Here to Download the Plugin Installer (EXE, packaged in a ZIP file)

  • Version 2.1.1, last updated 2021-04-21 (Click here to see version history).
  • Compatible with the Windows version of MarcEdit (versions 6 and up).
  • This plugin does not need to be installed as Administrator.  It should be installed while logged in as the user that will be using the software.

Creative Commons
AddPinyin Plugin for MarcEdit by Princeton University Library is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .

The Chinese dictionary data is based on 3 sources:

  • The Unihan database, copyright 1991-2020, Unicode, Inc. Last updated 2020-02-18.
  • CC-CEDICT, copyright 2020, MDBG. Last updated 2021-04-21.
  • User feedback: To suggest additional characters or phrases for the dictionary, please use the suggestion form.

The AddPinyin plugin takes a set of MARC records and converts Chinese text in selected fields to pinyin. For each converted field, an 880 field is created containing either the original or romanized text (as specified by the user). A subfield 6 with the appropriate linkage value is automatically added to both the original field and the 880. 

The plugin's functionality complements that of the OCLC Connexion Pinyin Conversion Macro; whereas the OCLC Macro is run on individual fields within a single record, the MarcEdit Plugin is designed for batch processing.  Also, since the plugin runs within MarcEdit, it is independent of a specific catalog.

The plugin generates pinyin using ALA-LC standards. The output is similar to that produced by the OCLC Macro. Please see the documentation for this marco for specifics. It is difficult to automate romanization with 100% accuracy, so it is always beneficial to manually proofread the results when practical. However, most of the needed adjustments will have to do with spacing, capitalization, and punctuation, not the pinyin itself. Efforts have been made to keep even these minor inaccuracies to a minimum. However, if you notice any errors or would like to suggest new phrases to be included in the dictionary, please submit them using this form. (It is the same form used to suggest phrases for the OCLC Macro).

This plugin can be run on files containing both Chinese and non-Chinese records. The plugin examines the 008 field to identify Chinese records, and leaves everything else untouched.  If a record has romanization added for some but not all Chinese fields, the plugin can add romanization for the remaining unconverted fields (if desired).  The main dialog of the program will tell you how many records contain Chinese text that could potentially be converted.  The user also has the option to swap the order of parallel fields already existing in the records.  (Swapping the fields can be done even if no other conversion is performed on the record set).

To run the plugin, do the following:

  1. Create a backup of the MARC record file to be converted.
  2. Convert the file to MRK format (using the MarcBreaker tool in MarcEdit) and open in MarcEditor. The file must be encoded in UTF-8.  For MARC-8 files, use MarcBreaker to convert the file to UTF-8 before running the plugin, then convert it back to MARC-8 afterwards, if needed.
  3. Open the "Plugins" menu and select "AddPinyin". A dialog will appear warning you that the conversion cannot be undone, and that the MRK file will be automatically saved after conversion. (This is why it is important to back the file up). Click "OK".
  4. A dialog will appear asking you which fields to generate romanization for. Certain fields are selected by default. Use the arrow buttons to specify which fields you would like to convert.
  5. Select whether you want the pinyin to be placed in the original field or the corresponding 880, and also whether you want to swap the order of existing parallel fields accordingly.
  6. Click the “Convert” button.
  7. After romanization is complete, the updated records will be displayed in the MarcEditor. Compile file back to MRC format by opening the “File” menu and selecting “Compile File into MARC”.