Jena Organization Corpus (JOCo)

JOCo is a corpus of annual reports (ARs) and corporate social responsibility (CSR) reports  of US American, British and German business organizations, i.e. corporations, which are listed in the main indices such as DOW JONES, S&P 500, and NASDAQ 100 for the USA; FTSE, FTSE AIM 100, FTSE 250 for Great Britain; DAX, MDAX, and TecDAX for Germany. All reports are in English: the German corporations provide reports in English for their international audiences as well. For a more detailed description of the corpus please refer to our paper:

Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, and Udo Hahn. 2018. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing. In ECONLP 2018 — Proceedings of the First Workshop on Economics and Natural Language Processing @ ACL 2018. Melbourne, Australia, July 20, 2018. Pages 20–31.

JOCo is intended to be used in a shared community effort to improve natural language processing techniques for the economic language domain as well as in the domains of business and management.
JOCo is provided by the Chair of Organization, Leadership, and Human Resource Management, Prof. Dr. Peter Walgenbach, and the Chair of Computational LinguisticsExternal link, Prof. Dr. Udo Hahn, both of Friedrich-Schiller-Universität Jena, Germany.

How to get the corpus?

1. All users of JOCo must apply for a license in order to receive a copy (data_use_agreement_jocopdf, 25 kb · de).

2. Please send the data use agreement to Dr. Philipp Poschmann (subject: "JOCo data use agreement").

3. In the following days, you will receive a download link, which is valid for 20 days.

4. If the JOCo is updated, you will receive an e-mail announcement including a link to the new version of the corpus. If you do not wish to receive this e-mail announcement, please send a short message.