Research Article


Quantifying Violence and Nonviolence:

Terrorism & Political Violence Events Datasets


Stephen M. Shellman

University of Georgia


College of William & Mary



I. Introduction

The majority of our systematic empirical conflict studies in our discipline emphasize the structural factors that lead to the onset or duration of fighting. That is they tend to concentrate on the characteristics of countries and/or groups (e.g., wealth, ethnic and religious fractionalization, power, regime type, terrain, etc.) and analyze how these characteristics affect the propensity of civil and international war onset, duration, and termination.[1] While this approach is fruitful for explaining general patterns of conflicts across space and time, the approach is not as useful for analyzing process theories of conflict and cooperation. Country attributes like geography (i.e., terrain) do not change at all over time and other attributes found to be associated with civil conflict like GDP, political institutions, population, and ethnic fractionalization change very little. Butler et al. (2005, 18) correctly point out, “Put bluntly, the factors analyzed in country level analyses are the same for a given country during war and peace, and are therefore incapable in predicting shifts from one period to the other.”

            Conflict process models should analyze multistage procedures in which the sequences of events, choices, and variables play a critical role in the outcome (Bremer 1996). As such, country-year national attribute studies are inadequate to test behavioral process theories. Such studies cannot assess the degree to which actors’ choices and behavior affect levels of conflict and cooperation. They lack the ability to theorize and test arguments about how decision-makers make premeditated tactical choices and how they choose to interact with their rivals over the course of political struggles. Cunningham, Gleditsch, and Salehyan (2005, 5) state, “Research on conflict can benefit tremendously from disaggregating war below the level of countries and considering the underlying interactions and mechanisms that make up what we characterize as a country being ‘at war.’”

            The focus of this article is to highlight datasets that can be used to study civil and international conflict-cooperation processes.[2]  Each of the datasets described below disaggregate the actors, targets, and events associated with conflict and cooperation across space and time. As such, they can be used to test hypotheses implied by our conflict-cooperation process theories. The focus of these datasets is on disaggregated events associated with conflict and cooperation at the daily level as opposed to other familiar aggregate event datasets focusing on the presence of civil or international war in a given country-year. In fact, we should note that many of these datasets code the daily battles and events associated with such aggregate high intensity conflict events like civil and international war. In particular, many of these datasets focus on non-state actors and their actions directed toward each other and towards state actors. Others focus solely on the violent targeting of non-combatants.

            The essay proceeds as follows. First, I define and discuss the utility and limitations of “events data,” the most widely used data to study patterns of terrorism and political violence. Second, I will briefly describe several political violence datasets focusing on domestic acts and compare and contrast them.  Third, I will highlight a few datasets used to study foreign policy behavior or interstate interactions. Fourth, I will briefly outline a few databases used to study transnational terrorism. Finally, I will draw attention to some new datasets generated to study domestic terrorism. Each of these datasets can be used to test hypotheses generated by conflict-cooperation process theories.


II. Events Data: Pros & Cons

The most widely used data to study patterns of political violence and terrorism are known as events data. Events data are “day-by-day coded accounts of who did what to whom as reported in the open press,” and “offer the most detailed record of interactions between and among actors” (Goldstein 1992, 369). Political events are defined as actions taken by an actor to advance its political interests. The political issues at stake usually involve the authority to make decisions concerning the extraction and/or distribution of social resources or values.

Most basic domestic political violence events datasets code the (1) actor taking the action, (2) the target receiving the action, (3) the action itself (the event), and (4) the date of the action/event (usually the day each event takes place). Others code the location of the event and/or the number of casualties associated with the event. Some example events coded in political violence datasets include armed attacks/conflict, nonviolent protests, negative statements, positive statements, low-level agreements between actors (e.g. ceasefires), and high-level agreements between actors (e.g. regional territorial autonomy). Actors include social actors such as teachers, students, and clergy, economic actors such as business owners and labor unions, and political actors such as prime ministers, generals, police, insurgent groups, dissident leaders, and rebel forces. I give an example of an event is coded under a particular scheme below.

Assume that news media report on new year’s eve in 1983 that “General Muhammadu Buhari overthrew the Shagari government.” Under the IPI scheme, the data generated would include the date (12/31/1983), the actor (16.1), the target (10), and the event type (-9). The actor value16.1 refers to “military general,” the value 10 refers to “national government,” and the value -9 refers to the IPI category subsuming successful “coups.” One could then also scale the event type on the IPI interval-like scale by assigning the value -85.18 to the -9 event category (see Shellman 2004b). These events data allow researchers to study the patterns and processes of both low and high-intensity conflict and cooperation at disaggregated temporal and spatial levels.[3]

Even though events data yield dense information about disaggregated actors’ behavior, they also have their share of limitations. In particular, events data are often criticized on validity and reliability grounds. To begin, most events data are generated from media reports and so coverage of events is not uniform. The media decides which events are reported and which ones are not. In many cases, media sources are slanted in the direction of a particular ideology and/or focused on a particular market. Decisions about what to print are often impacted by what will sell. Finally, a reporter has to see or hear about an event in order for it to get print consideration. Some forms of political violence like rapes and disappearances are often never reported to anyone, let alone the media. This is especially problematic when studying repression and dissent levels in country’s run by despotic or communist governments.

            However, research has shown that with all these limitations, media reports still yield the most coverage and information related to political events (Davenport & Ball 2002). As King and Lowe (2003, 617-18) point out

a large fraction of information available to political scientists for all types of analyses ––events data, annual data, other quantitative summaries, qualitative accounts, historical studies, etc.–– …passes through the hands of reporters at some point. It is imperfect…[though,] there should be no controversy over the claim that the immense volume of reportage …constitutes an enormous, and insufficiently mined, treasure of information.


One way to mine such information is through the collection of events data. Events data are no less valid or reliable than other data we use in our studies. I do not wish to make a case for the careless use of any data set simply because it is available.  The validity and reliability of data are important and the results we report in our studies should hinge on a strong signal being present in the data.  That said, one of the ways we can control for and average out media bias is by compiling events data from multiple sources. Reeves, Shellman, and Stewart (2006) echo previous scholars like Davenport and Ball (2002) and Francisco (2006) and advocate the analysis of multiple source datasets. They contend that “combining sources can help to eliminate (and/or average) the specific bias of a particular news agency and yield more accurate and reliable estimates of conflict and cooperation. Combining sources helps achieve a more representative sample of actual events” (Reeves et al 2006, 23). Furthermore, when combined and compared with qualitative analyses of particular cases, one usually draws the conclusion that events data provide more signal than noise with regard to the mood of the political situation within or between countries. In the end, events data are a useful source of information to study patterns and processes of political violence and terrorism. The remainder of this article outlines several international and domestic events datasets useful for studying conflict-cooperation processes.


III. Domestic Political Violence & Nonviolence Events Datasets

This section describes five domestic political violence and nonviolence datasets. These data generally contain information on event types, the date the event took place (at the daily level), the actor taking the action, and the target of the action for a specific country. Such data have mostly been used to study government-dissident interactions (Carey 2006; Francisco 1995; Moore 1995; Moore 1998, Moore 2000; Shellman 2006a; Shellman 2006b) or the effects of civil conflict on migration (Shellman and Stewart 2007; Shellman and Stewart n.d.). In particular the questions revolve around the effects of repression on dissent and the effects of dissent on repression. For each of the five datasets I give a brief description and note where one can locate such data and more information on each one. Following the brief descriptions, I compare and contrast them.


 A. Violent Intranational Conflict Data Project (VICDP).



B. Intranational Political Interactions (IPI Project).

        Description: “The IPI project was designed to measure political conflict and cooperation within societies through the coding of political event reports from international, regional, and local sources. These events were coded on two ten point scales which reflect the severity of various cooperative and conflictual statements and actions. This scaled events data can be used to calculate the volume and intensity of political conflict and cooperation within the domestic polity. In addition to facilitating the calculation of general levels of political conflict, the IPI coding scheme allows the examination of the dynamics of interaction among specific groups within the society. IPI gives scholars the ability to track interactions among social groups and between the state and social groups” (IPI Home Page).



C. European Protest & Coercion Data


D. Protocol for the Analysis of Nonviolent Direct Action (PANDA)



E. Project Civil Strife (PCS) – COMMON Datasets.


The above internal conflict-cooperation datasets focus on particular cases, actors, or event types and use a variety of coding methods. To begin, VICDP (the precursor to IPI) includes 5 cases spread out across the globe, IPI mainly focuses on Latin America (with a few exceptions) and Francisco’s data focus on Europe.[4] Other datasets such as the Protocol for the Analysis of Nonviolent Direct Action (PANDA) project began examining Europe as well, but was recently superseded by IDEA (Integrated Data for Events Analysis) which covers international and intranational conflict across the whole world (though only a portion of the data are publicly available).[5]

With respect to actors, VICDP, IPI, and Francisco’s protest and coercion data code multiple actors. However, there are some limitations. While Francisco codes multiple actors (students, Muslims, ethnic Bulgarians, democracy movements, etc.), the project ignores dissident-dissident interactions and ethnic conflict. VICDP and IPI code the most actors; but, most of the dyads in which the government is an actor, the general population or unspecified social actors are coded as the targets. While this poses no problems to two-actor models of government-dissident interactions, it is impossible to disaggregate the “general population” without going back to the original stories and recoding the events to students, workers, ethnic groups, etc. I found this to be the case both as a coder who worked on the project and from using the data in my dissertation project. For example, about 22% of the targets in Chile are generic social actors, while 50% are given the “government” code. Therefore, 72% of the targets in Chile are two of 99 possible actors. In Nigeria, all the various ethnic groups combined account for only 2% of the targets. There are other similar problems with regard to specific “actors” underrepresented in the IPI data as well.

With regard to event types, not all the datasets code both conflictual and cooperative events. Francisco’s data only analyze protest and coercion. This scheme leaves out high level conflictual events like armed conflict and guerrilla attacks and cooperative events such as positive and supportive statements, peace talks, and negotiated settlements. The PANDA project, developed by Bond et al. (1997), only codes nonviolent contentious events and focuses less on cooperation. The only datasets that code both conflict and cooperation are IPI, VICDP, and PCS.

Finally, the datasets differ with respect to how the data were generated. Human coders collected the IPI and VICDP datasets, while machines coded the majority of the PANDA, Francisco, and PCS datasets. Previous criticisms of event data center on coding inconsistencies and biases (Andriole and Hopple 1984; Laurence 1990). In addition, the costs of coding event data have reduced the rate of data collection. Yet, those problems have been mitigated by machine-readable data sources and machine coding. Schrodt and Gerner (1994) and King and Lowe (2003) show that machine-coded and human coded data are very similar and produce the same inferences in time series studies. Moreover, they are replicable and consistent. They are also less time consuming to collect than human coded data.

            The PCS data combine the strengths of many of its predecessors by coding conflict and cooperation, multiple actors, and using machines and automated coding software to eliminate inconsistencies, coder fatigue, and coding time. Moreover, the data focus on a new region and code some additional variables. Specifically, PCS seeks to code data for Bangladesh, Bhutan, Burma (Myanmar), Cambodia, India, Indonesia, Laos, Malaysia, Nepal, Pakistan, Papua New Guinea, the Philippines, Sri Lanka, Thailand, and Vietnam. For each case in addition to the usual event data codes, PCS records information on government and major dissident leaders, their dates in office, and the locations where events occur. PCS uses a modified version of Text Analysis By Augmented Replacement Instructions (TABARI), developed by Phil Schrodt, to generate domestic political event data.[6] TABARI uses a “sparse- parsing” technique to extract the subject, verb, and object from a sentence and performs pattern matching using actor and verb dictionaries.[7] In short, TABARI matches words from an electronic text file (news story) to words contained in the actor and verb dictionaries and assigns a corresponding code to each actor and verb. It also records the date. Finally, Shellman has modified the source code to code locations of events. Thus, in addition to coding actors and verbs, TABARI pattern matches cities and regions and assigns corresponding codes.

            Table 1 Comparison of Domestic Conflict & Cooperation Events Datasets

While most event data sets (internal and intranational) code events from a single news source,[8] PCS codes events from multiple news sources. As discussed above, language, coverage, style, and characterization by a source can influence the way an event is coded or even if it is coded at all. As such, PCS codes events from numerous sources including but not limited to include AFP, AP, Asia Today, BBC, Cambodia Today, China Daily, Christian Science Monitor, Hindu Times, New Strats Times (Malaysia), Philippines Daily Inquierer, The Edge (Malaysia), Jiji Press Ticker, The Jakarta Post, The Nation (Thailand), The Saigon Times, UPI, and Xinhua (China).  Not every source is available for the same time periods but all of these sources are in English and are available through Lexis-Nexis.

The PCS actor dictionaries are extensive. Previous datasets, which focus on internal events, either neglect to disaggregate domestic actors or seek to disaggregate actors but do so ineffectively. The major advantage this dataset has over its predecessors is the ability to study multiple rebel group interactions with the government and themselves, government and rebel coalition dynamics, the behavior of competing factions within a single group, and government and dissident leader turnover. Moreover, they code specific individuals as well as groups. For example, the Cambodia dictionary codes over 950 unique individuals and groups, while the Indonesia dictionary contains 670 unique actors, including over 50 different dissident organizations and political parties.  Furthermore, both capture well over 200 different distinct positions within the government and military.[9]  The current event codes invoked in the PCSCOMMON dataset are based on the World Events Interaction Survey codes developed by Charles McClelland (1976) at the University of Southern California. Such codes when used in analyses are often weighted by the Goldstein (1992) scale. There are plans to use the Conflict and Mediation Event Observations (CAMEO) event codes, discussed below, in the future.

            Automated coding can also record the locations of events.  Until recently, spatial units relevant to conflict were confined to the state. Buhaug and Gates (2002, 417) argue that “geographical factors play a critical role in how a civil war is fought and who will prevail. The “location and size of a country,” as well as the location and size of villages, towns, cities, and rebel camps “affect the design and nature of military strategy” (Buhaug and Gates 2002, 419). Recognizing this fact, PCS develops three location dictionaries for each case. One contains cities, another contains regions or provinces, and a third records major geographical landmarks near where an event takes place. Most event data projects simply record an event taking place somewhere in a country. The PCS location data allow for spatial analysis of conflict and help answer questions concerning contagion and diffusion (Buhaug and Gates 2002; Siverson and Starr 1991). In short, the PCSCOMMON datasets combine the best features of other events datasets. That said, the choice of dataset should still hinge on the questions one wants to ask, the hypotheses one wishes to test, and the cases one wants to study.



IV. International Political Violence Events Data Sets

            This section describes three sets of international political violence datasets. These datasets very much resemble the domestic datasets, yet contain information mostly on state to state interactions with some coverage of sub-state actors. The event codes for the projects below are synonymous with or modified from the WEIS project (McClelland 1976).  The KEDS/CAMEO datasets use TABARI or its predecessor KEDS to compile the events datasets. IDEA uses its own software program similar to TABARI developed by Virtual Research Associates to compile its database.  Such datasets have been used to study directed-dyadic and triangular foreign policy behavior (e.g. Moore and Lanoue 2003; Goldstein and Freeman.1991; Goldstein and Pevehouse 1997; Goldstein et. al. 2001), the influence of foreign policy on internal conflict (e.g. Moore 1995; Gledistch and Beardsley 2004), and the influence of foreign policy on directed dyadic refugee flows (e.g. Shellman and Stewart 2007; Shellman & Stewart n.d.). They have also been used for forecasting conflict and cooperation (e.g. Schrodt, Gerner and Yilmaz 2005; Schrodt 2000; Shellman and Stewart n.d.).


A. Integrated Data for Events Analysis (IDEA)


B. Kansas Event Data System (KEDS)


C. Conflict and Mediation Event Observations (CAMEO)


While the KEDS and CAMEO datasets focus on regional conflicts, the IDEA data cover the globe. Yet the temporal domain of the publicly available IDEA data is only a decade. Schrodt’s web comments sum up the differences between CAMEO and IDEA nicely:

In contrast to CAMEO and IDEA, KEDS focuses on coding international behavior among states and codes fewer tertiary actors. Although KEDS codes mostly state actors, some sub-state actors are coded such as Hamas and Hezbollah but they are not the focus of the project. Many more non-state actors are coded in IDEA and CAMEO than in KEDS, yet such actors are disaggregated to a lesser degree than PCS. KEDS data are based on the WEIS coding scheme and IDEA is generally backwards compatible with WEIS. “Aside from some minor label changes, …[the]… 22 event form categories remain essentially the same…to WEIS” ( That said, some changes to the IDEA scheme have been made over the years. Details of such changes can be found at CAMEO has developed additional changes to the WEIS scheme and most notably has reduced the 22 event categories to 20. Details of the CAMEO coding system can be found in the online codebook[10] and in a conference paper by Gerner, Schrodt, Abu-Jabr, and Yilmaz (2002).[11]

            In sum, if one wants to study foreign policy behavior, any of these datasets will suffice. The main differences include temporal and spatial coverage, coverage of non-state actors or tertiary actors, and a scheme developed specifically to study mediation (i.e. CAMEO). Again, the research questions and spatial-temporal domains of inquiry should determine which data source one uses in analyses.


V. Transnational Terrorism Datasets

This section describes two transnational terrorism datasets, one of which (ITERATE) is the most widely used dataset in the academic literature.  The other (MIPT-TKB) serves as a good web resource for background information and descriptive statistics. Summaries appear below and then I elaborate on each one.





B. Terrorism Knowledge Base (TKB)


Definitions of terrorism are debated in the academic literature, so it is important to know what types of events these databases consider as terrorism. The ITERATE codebook defines terrorism as

the use, or threat of use, of anxiety-inducing, extra-normal violence for political purposes by any individual or group, whether acting for or in opposition to established  governmental authority, when such action is intended to influence the attitudes and behavior of a target group wider than the immediate victims. (Mickolus et. al. 2004, 2)


ITERATE only codes transnational events defined as actions that “transcend national boundaries” … “through the nationality or foreign ties of its perpetrators, its location, the nature of its institutional or human victims, or the mechanics of its resolution” (Mickolus et. al. 2004, 2).


            The ITERATE dataset codes a variety of variables including but not limited to the date, location, group responsible, target, type of event, and number of casualties and deaths (disaggregated by type of person) based on the project’s definition of transnational terrorism. These data have been used in statistical time series models (e.g., Enders and Sandler 1993, 1999, 2000, 2002) and cross-sectional pooled time series models (e.g., Enders and Sandler 2006; Li and Schaub 2004; Li 2005) to evaluate the effectiveness of anti-terrorism policies, examine the distribution of transnational terrorism events among countries, and analyze the effects of democracy and globalization on transnational terrorism.  

            ITERATE’s only competitor is the MIPT/RAND Terrorism Knowledge Base (TKB), which also contains information on transnational incidents from 1968 to the present. It defines such events as “incidents in which terrorists go abroad to strike their targets, select domestic targets associated with a foreign state, or create an international incident by attacking airline passengers, personnel or equipment” ( Glossary. jsp#I). It also contains information on domestic events from 1998 to the present, which it defines as “incidents perpetrated by local nationals against a purely domestic target” (  However, it is important to note that the TKB is not published in a nice, neat incident spreadsheet format (see footnote #12). Instead, queries must be completed to extract the incident information from the database. One useful feature is the ability to create graphs using the Incident Analysis Wizard on the TKB website. In particular, such graphs are useful additions to courses taught on terrorism. Moreover, the TKB also contains information on characteristics of terrorist organizations such as their goals, bases of operation, tactics, and leadership.  In sum, the only database which can be used legally in analyses is the ITERATE database without special permission from MIPT/RAND (see fn # 12). However, the TKB contains a wealth of information related to terrorist incidents and organizations useful to teachers, researchers, and practitioners.


VI. Domestic Terrorism Datasets

While there is a lack of access to international terrorism events datasets, there is a severe lack of access to domestic terrorism event datasets. Other than the TKB data described above, there are only two other datasets which contain domestic terrorist events for more than one or two countries. However, according to the data made available by the TKB (1998-present), international terrorism makes up only twelve percent of terrorist incidents (MIPT 2006). As such, much more attention needs to be focused on domestic terrorism and associated events datasets. Such data are useful in explaining the effectiveness of anit-terrorism policies (Asal, Shellman, and Meek 2006), the structure and process variables associated with domestic terrorism, and the effects of domestic terrorism on government policies (Miller and Shellman 2006). Below, I summarize one complete and publicly available domestic terrorism dataset for Western Europe and a new ongoing effort focusing on South and Southeast Asia.


A. Terrorism in Western Europe: Events Data (TWEED)



B. Project Civil Strife – TERROR Datasets.

        PI: Stephen M. Shellman


        Description: PCSTERROR uses information from publicly available news sources to construct chronologies of domestic terrorist events across several South and Southeast Asian countries. The data are human coded as opposed to machine coded (e.g., PCSCOMMON). Like ITERATE and PCSCOMMON, it codes a multiplicity of sources (e.g., Associated Press, United Press International, BBC, Xinhua, JiJi Press Ticker, Jakarta Post, etc. – just about any source we can access through Lexis Nexis). The PCSTERROR dataset codes a variety of variables including but not limited to the date, location, group responsible, target, type of event, and number of casualties and deaths disaggregated by type of person (e.g. government official, police, citizens, etc.).

        Cases: Bangladesh, Bhutan, Burma (Myanmar), Cambodia, India, Indonesia, Laos, Malaysia, Nepal, Pakistan, Papua New Guinea, the Philippines, Sri Lanka, Thailand, and Vietnam.

        Temporal Domain: 1980-2005



Both of the TWEED and PCSTERROR datasets code variables similar to ITERATE. In fact, the PCSTERROR codebook is almost identical with a few supplementary codes added. However, ITERATE and PCSTERROR could easily be merged together to examine the international-domestic terrorism nexus. The PCSTERROR data are easily disaggregated by group and by region to allow for disaggregated analyses of a country’s terrorist events. While the TWEED dataset lacks government actions, the PCSTERROR dataset seeks to code public government actions such as thwarted attacks and targeted killings. PCSTERROR also differentiates between suicide and non-suicide attacks. Finally, not only does PCSTERROR code the city and location of the attack but it also codes the specific target (e.g. railway, marketplace, caf, etc.). While the PCSTERROR data are in their infancy stages of development, the TWEED data are publicly available and have not to my knowledge been systematically examined.


VII. Conclusion

To date, most civil and international conflict studies focus attention on national attributes and their correlation with “war” rather than on analyzing how actors’ decisions in constrained environments affect the magnitude and intensity of different levels of civil conflict. Though students of domestic and international conflict have focused attention on the causes of political violence, civil and international war, revolution, domestic and transnational terrorism, etc., that work typically focuses on the attributes of the social, political, and economic environments in which such conflicts occur and on explaining the onset or duration of those conflicts. This approach has created a disconnect in the literature between the behavior of the actors participating in these conflicts and the group and country attributes used to explain such behavior. Hypotheses implied by behavioral theories of government-dissident interactions cannot be accurately tested using such aggregate attribute designs. Tilly (1985, 1) contends that “since collective action is dynamic, and since its outcomes depend very strongly on the course of interaction, static models that simply match behavior to group characteristics or outcome to group behavior represent the entire process poorly.” While a nascent “process models” literature exists, the theoretical work is often under-tested and underspecified in empirical work.

To move the study of internal conflicts forward, we must test our process theories by analyzing the interactions among actors over time. One of the reasons for the extant disconnect between theory and empirical tests is the dearth of data to test hypotheses implied by different theories. However, the datasets discussed above can aid in closing this gap. One fruitful way to model such conflict-cooperation processes is to examine time-series case studies of the behavioral interactions among a set of parties to the conflict. While cross-national studies can detect and explain general patterns of conflict, they are incapable of addressing conflict as a process unfolding over time. “Notions of reciprocity and strategic dependence are ignored” in such national attribute studies (Butler, Gates, and Leiby 2005, 3). Those types of studies cannot address questions as to what will happen next in a conflict or how one actor’s behavior can alter the dynamics of the conflict process. As a result, we need to develop theory capable of addressing such processes and design empirical tests that capture the arguments and test the implied hypotheses.

Developing theories that explain the ebb and flow of conflict-cooperation levels among a set of actors has the potential to yield more policy relevant information than national attribute models. For example, knowing what types of actions and events quell violence is much more useful information than knowing the presence of mountains increases the probability of fighting. While we can prescribe policies to quell violence which are consistent with the sequences of behavior associated with attenuating conflict levels, we cannot destroy all mountains in conflict regions. Behavioral models can explain the types of behavioral counteractions that limit the use of violent tactics by the opposition. 

I hope this article motivates additional systematic empirical work on process theories of conflict and cooperation. I also hope these datasets inspire new theories and new hypotheses to test. Oftentimes, the availability of data influences our theories and hypotheses rather than our theories influencing our data collection efforts. I believe these data are useful in answering the extant questions discussed above, but they should also be used in new innovative ways to test new theories and hypotheses. 




Andriole, Stephen J. and Gerald W. Hopple. 1984. "The Rise and Fall of Events Data: From Basic Research to Applied Use in the U.S. Department of Defense." International Interactions 11:293-309.

Asal, Victor. Stephen M. Shellman, and Samantha Meek. 2006. “How Have You Killed Lately? A Substitution Model of Domestic Terrorism in India, 1980-2005.” Presented at the North American Peace Science Society (International) meeting, Novemebr 12.

Bond, Doug, J. Craig Jenkins, Charles L. Taylor and Kurt Schock. 1997. “Mapping Mass Political Conflict and Civil Society: The Automated Development of Event Data.” Journal of ConflictResolution 41,4:553-579.

Bremer, Stuart. 1996. “Advancing the Scientific Study of War.” In Stuart Bremer and Thomas Cusack (eds). The Process of War: Advancing the Scientific Study of War. Amsterdam, Netherlands: Gordon and Breach, pp. 1-33.

Buhaug, Halvard. & Scott. Gates, 2002. ‘The Geography of Civil War’, Journal of peace Research 39(4):417-33.

Butler, Christopher K., Scott Gates, and Michele Leiby. 2005. “Social Networks & Rebellion.” Paper presented at the Conference on "Disaggregating the Study of Civil War and Transnational Violence" University of California Institute of Global Conflict and Cooperation San Diego, CA, USA, 7-8 March. Available here:

Carey, Sabine. 2006. The Dynamic Relationship Between Protest, Repression, and Political Regimes. Political Research Quarterly. (In press).

Cunningham, David, Kristian Skede Gleditsch, and Idean Salehyan. 2005. “Dyadic Interactions and Civil War Duration.” Paper prepared for the Conference on "Disaggregating the Study of Civil War and Transnational Violence" University of California Institute of Global Conflict and Cooperation San Diego, CA, USA, 7-8 March. The paper is available here:

Davenport, Christian and Patrick Ball. 2002. “Views to a Kill: Exploring the Implications of Source Selection in the Case of Guatemalan State Terror, 1977-1996.” Journal of Conflict Resolution 46(3): 427-450.

Enders, Walter, and Todd Sandler. 1993. The Effectiveness of Antiterrorism Policies: A Vector-Autoregression- Intervention Analysis. American Political Science Review 87 (4):829-844.

Enders, Walter, and Todd Sandler. 1999. Transnational terrorism in the post-Cold War era. International Studies Quarterly 43 (1):145.

Enders, Walter, and Todd Sandler. 2000. Is Transnational Terrorism Becoming More Threatening? A Time-Series Investigation. Journal of Conflict Resolution 44 (3):307-332.

Enders, Walter, and Todd Sandler. 2002. Patterns of Transnational Terrorism, 1970-1999: Alternative Time-Series Estimates. International Studies Quarterly 46 (2):145.

Enders, Walter, and Todd Sandler. 2006. Distribution of Transnational Terrorism Among Countries by Income Class and Geography After 9/11. International Studies Quarterly 50 (2):367-393.

Engene, Jan Oskar. 2004. Terrorism In Western Europe: Explaining The Trends Since 1950. Cheltenham,: Edward Elgar Publishing.

Francisco, Ronald A. 1995. "The Relationship between Coercion and Protest: An

Empirical Evaluation in Three Coercive States." Journal of Conflict Resolution 39: 263-82.

Francisco, Ronald A.  2006. “Benchmarks and Samples: Coding Event Data with Multiple Media Sources.” Working Paper, currently under review.

Gerner, Deborah J., Rajaa Abu-Jabr, Philip A. Schrodt, and mr Yilmaz. 2002. “Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions.” Paper presented at the International Studies Association, New Orleans, March 2002.

Gleditsch, Kristian and Kyle Beardsley. 2004. “Nosy Neighbors: Third-Party Actors in Central American Civil Wars.” Journal of Conflict Resolution 48(3): 379-402.

Goldstein Joshua. S. 1992. “A conflict-cooperation scale for WEIS events data.” Journal of Conflict Resolution 36: 369-85.

Goldstein, Joshua  S. and John R. Freeman. 1991. "U.S.–Soviet–Chinese Relations:      Routine Reciprocity, or Rational Expectations?" American Political Science      Review 85(1):17-36.

Goldstein, Joshua S., and Jon C. Pevehouse. 1997. "Reciprocity, Bullying and International Cooperation: A Time-Series Analysis of the Bosnia Conflict." American Political Science Review 91,3: 515-530.

Goldstein, Joshua S., Jon C. Pevehouse, Deborah J. Gerner, and Shibley Telhami. 2001. "Dynamics of middle East Conflict and U.S. Influence, 1979-97." Journal of Conflict Resolution 45, 5: 594-620.

Gary King and Will Lowe. 2003. "An Automated Information Extraction Tool For International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design," International Organization, 57(3): 617-642.

Laurence, Edward J. 1990. "Events Data and Policy Analysis." Policy Sciences 23:111-132.

Li, Quan.  2005.  “Does Democracy Promote or Reduce Transnational Terrorist Incidents?” Journal of Conflict Resolution 49(2):278-297.

Li, Quan and Drew Schaub.  2004.  “Economic Globalization and Transnational Terrorist Incidents:  A Pooled Time Series Cross Sectional Analysis,” Journal of Conflict Resolution 48(2):230-258.

McClelland, Charles A. 1976. World Event/Interaction Survey Codebook. (ICPSR 5211). Ann Arbor: Inter-University Consortium for Political and Social Research.

Mickolus, Edward F., Todd Sandler, Jean M. Murdock, and Peter Fleming. 2004. International terrorism: Attributes of terrorist events, 1968-2003 (ITERATE) Vinyard Software.

Miller, Gregory D. and Stephen M. Shellman. 2006. “Tipping Points & Turning Points: When Do States Care about Terrorism. Presented at the American Political Science Association’s annual convention, September.

MIPT. 2006. MIPT Terrorism Knowledge Base National Memorial Institute for the Prevention of Terrorism, Dec. 4, 2006. Available from

Moore, Will H. 1995. "Action-Reaction or Rational Expectations: Reciprocity and the Domestic International Conflict Nexus During the 'Rhodesia Problem.'" Journal of Conflict Resolution 39(1): 129-167.

Moore, Will H. 1998. "Repression and Dissent: Substitution, Context, and Timing."

American Journal of Political Science 42(3): 851-873.

Moore, Will H. 2000. "The Repression of Dissent: A Substitution Model of Government

Coercion." Journal of Conflict Resolution 44:107-127.

Moore, Will H. and David J. Lanoue. 2003. “Domestic Politics and US Foreign Policy: A Study of Cold War Conflict Behavior.” Journal of Politics 65(2):376-396.

Reeves, Andrew, Stephen M. Shellman, and Brandon Stewart. 2006. “Media Generated Data: The Effects of Source Bias on Event Data Analysis.” Paper delivered at the annual meeting of the International Studies Association, San Diego, March 22-25.

Schrodt, Philip A. and Deborah J. Gerner. 1994. “Validity Assessment of a Machine-Coded Event Data Set for the Middle East, 1982-1992.” American Journal of Political Science 38:825-854.

Shellman, Stephen M. 2004b. “Time Series Intervals and Statistical Inference: The Effects of Temporal Aggregation on Event Data Analysis.” Political Analysis 12(1): 97-104.

Shellman, Stephen M. 2004b. “Measuring the Intensity of Intranational Political Interactions Event Data: Two Interval-Like Scales.” International Interactions 30(2): 109-141.

Shellman, Stephen M. 2006. “Leaders & Their Motivations: Explaining Government-Dissident Conflict-Cooperation Processes." Conflict Management & Peace Science 23(2).

Shellman, Stephen M. 2006b. “Process Matters: Conflict & Cooperation in Sequential Government-Dissident Interactions.” Forthcoming in Security Studies 15(4).

Shellman, Stephen M. and Brandon Stewart. n.d. “Predicting Risk Factors Associated with Forced Migration: An Early Warning Model of Haitian Flight.” Forthcoming in Civil Wars.

Shellman, Stephen M. and Brandon Stewart. 2007. “Political Persecution or Economic Deprivation? A Time-Series Analysis of Haitian Migration to the United States” with Brandon Stewart. Forthcoming in Conflict Management & Peace Science 24(1).

Siverson, Randolph M. and Harvey Starr. 1991. The Diffusion of War: A Study of Opportunity and Willingness. Ann Arbor: University of Michigan Press

Tilly, Charles. 1985. “Models and Realities of Popular Collective Action.” Working Paper No. 10, Center for Studies of Social Change. New York: New School for Social Research.




[1] There are of course exceptions.

[2] I should note that this article does not summarize each and every political violence and terrorism dataset available. Instead, I focus on some of the most recent and widely used data sources used in the study of political violence and terrorism.

[3] See Shellman (2004a, 2004b); Moore (1998, 2000); Carey (2004); Gledistch and Beardsley (2004);

[4] Some focus on Latin America and Korea; and there is less than one year of Burma data (1988).

[5] See  IDEA is produced by Virtual Research Associates, Inc.

[6] See for information on the KEDS and TABARI projects.

[7] TABARI recognizes pronouns and dereferences them. It also recognizes conjunctions and converts passive voice to active voice (Schrodt 1998).

[8] For example, early KEDS data and IPI data come from Reuters, while later KEDS data come from Agence France Presse. WEIS data come from The New York Times Index.

[9] Note that dictionaries often contain multiple phrases/terms for the same actor. As such, 959 unique Cambodian actors are coded using a dictionary with 8844 terms. There are 671 actors and 2007 terms for Indonesia.

[10] See

[11] See

[12] A FAQ on their website reads: Can you provide the entire incident dataset? Can I copy the dataset to share with colleagues or post on another website?
Researchers frequently request a copy of the RAND incident dataset to save time and effort. Unfortunately, the dataset is proprietary, and the nature of MIPT’s agreement with RAND does not allow us to release the complete or partial copies of the dataset to others. While it is possible to compile a dataset on your own from the system for your own personal research, any sharing of the dataset with colleagues or reposting to another website would constitute a copyright violation.