[go: up one dir, main page]

 

The worldwide reach of RePEc

December 26, 2008

Following up on the post two weeks ago about how RePEc tries to contribute to the democratization of research, it is interesting to how far RePEc reaches in the world. While we do not have any recent study looking at who uses the RePEc services as a reader, we know much better who the contributors are. First the authors, of which about 18,500 are distributed over 118 countries (and all US states). Then, the 960+ RePEc archives, which each contribute bibliographic data to the project, are dispersed in 64 countries. But some of those archives collect data from several institutions. Thus, we actually have publications from 70 countries (and all but five US states: AK, CO, NE, NH and SD). And this is how this would look like on a world map:

Distribution of publications across the world


Institutional data in RePEc

December 19, 2008

RePEc gathers information not only about publications and authors, but also institutions. Specifically, the EDIRC project (Economics Departments, Institutes and Research Centers) catalogues since 1995 all academics and government institutions that employ a significant share of economists, including think tanks and associations. For-profit organizations (banks, consultants, etc.) are listed if they contribute their publications to RePEc. As of today, 11,000 institutions are listed, including over 600 associations. Over 4000 have at least one registered author and about 1000 have some publication in RePEc.

The collected institutional data is used and displayed in various ways throughout RePEc. Authors use it when
they register to determine their affiliations. So do RePEc archives for their publications. Author and institution data are combined on EDIRC to compile the publication output of all institutions. Combine this with citation data from CitEc and download data from LogEc to determine institutional rankings.

Note that all the information about institutions has been gathered with the help of a lot of people.


RePEc and the democratization of research

December 9, 2008

In the last issue of the American Economic Review, the following article caught my eye: Restructuring Research: Communication Costs and the Democratization of University Innovation by Ajay Agrawal & Avi Goldfarb. In short, it documents who gained in electrical engineering faculties from the reduced cost of collaboration through the introduction of Bitnet, in the early Internet days. The basic result is that the middle-tier universities benefited the most. Indeed, the top ones were already well connected with each other, and the middle ones took advantage of collaborating with the top ones.

The main goal of RePEc is precisely the democratization of research. Given publication delays in Economics, if one wants to stay abreast of developments at the frontier of research, one needs to read working papers. Before the Internet, the only way to get hold of them was either if you were already at a top ranked Economics department, or if you were somehow within a club of well connected researchers. Just being aware of the most current research was a challenge for anybody outside these circles. This is what motivated Thomas Krichel, as a research assistant in 1991, to find ways to learn about new working papers, and share what he found. This initiative evolved into RePEc in 1997.

Are Elite Universities Losing Their Competitive Edge? by E. Han Kim, Adair Morse & Luigi Zingales documents that Economics faculty in elite universities where more productive at least in part due to their location in the 1970s, and that such a location effect has disappeared by the 1990s. While it is open whether RePEc has contributed to such democratization, we have always favored it: everybody should be able to learn about current research, and everybody should be able to contribute to it.


RePEc in November 2008

December 2, 2008

We have just experienced a tremendous month. First, about 25’000 works were added, second we have seen traffic like never before. The only downside was that we had to move the blog due to various issues.

The push in new material was partly due to additions from Agecon Search, as well as from a lot of activity from many other archives and finally from 13 new archives, more than usual: ADRES, Universität Wuppertal, CORE, INRA, University of Ottawa, University of Osijek, University of Nevada, Las Vegas, Pion Ltd, University of Texas at San Antonio, University of Indonesia, Asociación Española de Profesores Universitarios de Contabilidad, University of Lancaster (II), Bilgesel Yayincilik.

In terms of traffic, we counted 860,187 file downloads and 3,292,711 abstract views on Econpapers, IDEAS, NEP and Socionet. These are easily new records.

Which brings us to the thresholds we passed during the past month, an impressive list:
50’000’000 cumulative article abstract views
12’000’000 cumulative article downloads
7’000’000 cumulative downloads on EconPapers
3’000’000 monthly abstract views
800’000 monthly downloads
650’000 works listed
550’000 online works listed
350’000 abstracts listed
270’000 working papers listed
200’000 online working papers listed
200’000 working paper abstracts listed
125’000 working papers with references
120’000 articles with citations
4’000 institutions with registered authors
2’000 books listed
800 journals


Parsing citations

November 22, 2008

One of the services RePEc offers to authors is the discovery of citations, CitEc. This is a difficult undertaking as this needs to be done entirely automatically. As project leader José Manuel Barrueco Cruz discusses in a previous post, the reference section of a paper is extracted through a series of steps: pdf download, file conversion to PostScript, further conversion to plain text, identifying reference section. In each of these steps there are losses.

But even once the reference section is in hand, we are not out of trouble. One needs to identify where each reference starts and ends, then try to match it with something already in RePEc. Considering all the different citation styles, typos, and plain errors, this is a daunting task. Matches that are sufficiently close are counted as citations, matches that are in some grey zone are fed to the RePEc Author Service to solicit the author’s help in sorting them out. Below are a few examples of what is offered to authors, for the case of a classic article by Gary Becker, Kenneth Murphy and Robert Tamura, Human capital, fertility and economic growth:


  • [3] Becker, G.; Murphy, K. ald Tamura, R. (1993)Humall capital, fertility ald ecollomic growth 01 Humall Capital, third editioll, Gary Becker.
  • Becker, Gary S.; Murphy, Kevin M.; and Tamura, Robert. Human Capital, Fertility, and Econonric Growth, Journal of Political Economy, October 1990 98(5) Part 2, pp. S12-S37.
  • Becker GS, Murphy KM, Tamura R (1990) Human capital, fertility and economic growth. J Polit Econ 98:S12–S37.
  • 1-25. Kevin M. Murphy, and Robert Tamura, Human Capital, Fer- tility and Economic Growth, Journal of Political Economy, October
  • BECKER, 0. S., K. M. MURPHY and R. TAMURA (1990) Human Capital, Fertility and Economic Growth, Journal of Political Economy 98, S 12-37.
  • [6] Becker, G., Murphy, K. and Tamura, R. (1990), Human capital, fertility, and economic growth, Journal of Political Economy, vol. XCVIII, pp.12-37.
  • Population and Development Review, Vol.12, Supplement: Below-Replacement Fertility in Industrial Societies: Causes, Consequences, Policies, pp. 69-76. Becker, Gary; Kevin Murphy, y Robert Tamura. (1990). Human Capital, Fertility and Economic Growth. The Journal of Political Economy, Vol.98, No.5, Part 2: The Problem of Development: A Conference of the Institute for the Study of Free Enterprise System, S12-S37.
  • (March/April 1973 Supplement), S279-88. ______________ Kevin M. Murphy, and Robert Tamura, Human Capital, Fertility, and Economic Growth, Journal of Political Economy, XCVIII
  • Becker, S. Gary, Kevin, M. Murphy and Tamura, Robert (1990). `Human Capital, Fertility, and Economic Growth The Journal of Political Economy, Vol. 98, Issue 5, Part 2, Oct. 1990, pp. S12-S37.
  • Bankconference on developmenteconomics. ecker, Gary, KevinMurphy, and RobertTamura. 1990. Human Capital, Fertility, and EconomicGrowth., Journal of PoliticalEconomy 98, 5, Part 2, pp. S12-S37.

These examples show what can go wrong in the file conversion and how citing authors can make mistakes. Still, CitEc has been able to recognize there references, but is not sure enough about them.

This also highlights that we try to minimize errors, even if this means leaving good citations out. Other citations services may have a different approach.


Looking for a deep link?

November 21, 2008

If you were following a link and were expecting to find a specific post on the RePEc blog, we unfortunately had to move to a different host and links were broken. Please look for your post in the archives. Or if you were using one of the RSS feeds, please use the new ones: entries or comments. We apologize for the inconvenience.


The blog has moved to a new host

November 19, 2008

Due to chronic problems with DOS attacks and spamming that have crippled several times the host server, the RePEc has now moved to a new host. It is still available under the old https://blog.repec.org/ address, but no more under the alternative http://repec.org/blog/. Also, the addresses within the blog have all changed, which breaks deep links. Finally, old RSS feeds may still work as they are redirected, but it is safer to recreate them.

Users who created accounts at the old location will have to create new ones, unless they have already one on WordPress. I am very sorry for the trouble, and especially for the violation of the RePEc principle that links should never break. But I think we now have a permanent home for this blog and this should not happen again.


RePEc in October 2008

November 5, 2008

The major development this past month is that the contents of AgEcon Search are now listed on RePEc. About 30,000 works will gradually be integrated over the next weeks. Also, October is traditionally a busy month, which is reflected by a large number of new participants (authors and institutions) and high traffic. We recorded 701,893 file downloads and 2,757,234 abstract views. In addition, the following publishers joined us during this month: British University in Egypt, Migration Letters, Universitatea “Al. I. Cuza”, Université du Littoral, AgEcon Search, EERI, University of Suceava, Econometica, Spiru Haret University, WorldFish Center, Université Libre de Bruxelles, esocialsciences.com, Scuola Superiore Sant’Anna, Tufts University.

In terms of thresholds passed this month, we have:
150,000,000 cumulative abstract views
25,000,000 year-to-date abstract views
640,000 items listed
375,000 articles listed
18,000 authors registered
3,000 series and journals indexed
2,500 book chapters listed


Call for comments: modifications in the rankings of institutions

October 19, 2008

One feature of RePEc is its ability to rank researchers and the institutions they are affiliated with. Researchers create a list of affiliations when they register in the RePEc Author Service. However, this system was devised before rankings started to be computed, and some unforeseen consequences have emerged for authors with multiple affiliations. As there is no way to determine which affiliation is the main one, or what percentage economists would allocate to each, we are forced to treat each affiliation equally for ranking purposes. This leads in several cases institutional rankings to be “hijacked” by organizations that offer secondary affiliations. See, for example, the overall ranking of institutions. Another consequence can be found in the regional ranking, where individuals with a main affiliation from outside may take the place from legitimate insiders. Prime examples are Massachusetts, the United Kingdom and Germany.

What are the solutions? The obvious one is to modify the RePEc Author Service scripts to allow the declaration of a main affiliation or of affiliation shares. We have pondered that for some time now but find it very difficult to implement, especially as the main resource person for this project is not with us anymore. Thus we need to find some way to proxy the affiliations shares. I want to propose here one way to do this, open it for discussion, with the goal of having a formula in place for the January 2009 rankings.

The logic of the proposed formula is that there are many people affiliated with a particular institution, then it must be that most of them have courtesy or secondary affiliations. If person A is affiliated with institutions 1 and 2, institution 1 has many people registered and institution 2 few, then the ranking scores of person A should count more toward institution 2 than 1. Of course, such a distribution scheme pertains only to authors with multiple affiliations.

To be precise, let I be set set of affiliations of an author. For each i in I, let Si be the number of authors affiliated with institution i. Compute S as the sum of all Si. The weight of each affiliation is Ti=S/Si. These weights are then normalized to sum to one.

Take the following example. Economist A is affiliated with the Harvard Economics Department (46 registrants), the NBER (324 registrants) and the CEPR (262 registrants). The respective Ti would be 632/46=13.74, 632/324=1.95, and 632/262=2.41, given that 46+324+262=632. After normalizing the T‘s to one, Economist A’s ranking scores would count to 13.74/18.10=75.9% for the Harvard Economics Department, 1.95/18.10=10.8% for the NBER and 2.41/18.10=13.3% for the CEPR. For regional rankings, 86.7% (75.9% + 10.8%) of his scores would count in Massachusetts and 13.3% in the United Kingdom. Under current rules, scores are distributed fully to affiliated institutions and count fully in each region.

This is much simpler than I can manage to explain here… But a few additional details are in order: some variations in definitions can be discussed: Si can represent the number of registrants, the number of authors (registrants with works) or the numbers of works of authors. The latter would be to avoid institutions to discourage (erroneously) young faculty with few works to sign up. I favor the number of authors. Also, we need to deal with affiliations that are not listed in the database (EDIRC) and thus do not have a defined number of registrants. One solution is to just ignore such affiliations. The drawback is that the relevant authors may not get ranked in some regions where they are genuinely affiliated. Thus I propose to apply for those institutions the average Si of the other affiliations. If no affiliation is in the database, all get the same weight.

I now welcome comments on how to proceed and hope to implement the new scheme for the January 2009 rankings, which are released in the first days of February 2009.

January 18, 2009 Update: The new ranking method for institutions has now been programmed and is ready for the early February release. The formula discussed above has been adopted with two amendments. The first was discussed in the comments: 50% of the weight is allocated to the institution with the same domain name as the author’s email address. The remaining 50% is allocated over all affiliated institutions by the formula given above. The second amendment pertains to the weights of institutions that are not listed in EDIRC. As there is no author count for them, I put the default at the average number of authors per listed institution, currently 4.55.

February 3, 2009 Update: I am receiving many questions about the sudden changes in the rankings within countries. As authors with multiple affiliations do not count fully in each location any more, their ranking has worsened. Similarly, institutions that have many members with multiple affiliations now look worse. Note also that a few small errors have crept in, and they will be corrected for the February ranking.


October 14, 2008, Open Access Day

October 14, 2008

October 14, 2008, has been declared Open Access Day to increase the awareness of Open Access. RePEc, and its predecessors, have been promoting open access for 15 years now, by enhancing the dissemination of preprints, which in Economics are usually called working papers or discussion papers. A quarter million of them are now listed, with many of them being close versions of published articles that are hidden behind a publisher’s paywall. Whenever possible, we link the two versions. The conditions are that the titles be very similar and the author be registered in the RePEc Author Service, having claimed all version in the research profile. RePEc also indexes numerous open access journals, with their article labeled to recognize free downloads.

In this respect, it is important to note that the vast majority of publishers allow authors to publish working papers, in many cases even as post-prints (after publication of the journal article). Through the linking between versions we do in RePEc, this essentially comes to make pay-journals open access. For a list of publishers and there policies, see SHERPA/RoMEO.

Are Open Access works popular? We have not systematically studied this so far, but consider the following. A working paper available online has been downloaded on average 1.77 times in September 2008 (after numerous corrections to eliminate robots and multiple downloads), while the figure stands at 0.97 for journal articles (including those that are open access). Also many working paper series have impact factors superior to many journals, highlighting that researchers in Economics do not hesitate to cite pre-prints.