May 17, 2013

DH @ Notre Dame

JSTOR Tool — A Programatic sketch

JSTOR Tool is a “programatic sketch” — a simple and rudimentary investigation of what might be done with datasets dumped from Data For Research of JSTOR.

More specifically, a search was done against JSTOR for English language articles dealing with Thoreau, Emerson, Hawthorne, Whitman, and transcendentalism. A dataset of citations, n-grams, frequently used words, and statistically significant key words was then downloaded. A Perl script was used to list the articles, provide access to them, but also visualize some of their characteristics. These visualizations include wordclouds, a timeline, and a concordance.

Why do this? Because we suffer from information overload and computers provide a way to read things from a “distance”. Indexes and search engines are great, but no matter how sophisticated your query, the search results are going to be large. Given a corpus of materials, computers can be used to evaluate, analyze, and measure content in ways that are not feasible for humans. This page begins to illustrate how a cosmos can be created from an apparent chaos of content — it is a demonstration of how librarianship can go beyond find & get and move towards use & understand.

Give JSTOR Tool a whirl, and tell me how you think the data from JSTOR could be exploited for use & understanding.

by Eric Lease Morgan at May 17, 2013 03:40 PM

May 02, 2013

Catholic Portal

CRRA Update March/April 2013

CRRA Update
March/April 2013

• From the Board: Janice Welburn, chair, announcing CRRA as a nonprofit corporation
• From the Membership Committee: Evelyn Minick , chair, welcomes Mount St. Mary’s University
• Member News: Congratulations to Joe Lucia (Villanova), Morgan McIntosh Hodgetts (DePaul) and Maria Mazzenga (Catholic University)
• Feature Article: Indexing and displaying Encoded Archival Description files in the Catholic portal
• Committee Updates: From the Collections Committee on Treasures from the Catholic Research Resources Alliance: Women Religious; Newspapers Task Force on Survey of Member Holdings
• From the CRRA: Annual plan update and Member holdings in the portal
• CRRA in the News: Thanks to Diane Maher (University of San Diego)
• Portal Tech Tip: Using the comments feature
• Upcoming events: Catholic Legacies in Victoria (May 28-29, 2013)

________________________________________
Save the date:
We will hold our CRRA All Member Meeting on Tuesday morning, July 2, 2013 in Chicago. Details will be posted here and to the CRRA News and Events page http://www.catholicresearch.net/cms/index.php/crra-news-and-events/ as they become available. Please plan to join us!
________________________________________
FROM THE BOARD
Janice Welburn, CRRA Board Chair, Dean of Libraries, Marquette University

I am pleased to share the good news that CRRA is now a nonprofit corporation set up under the State of Wisconsin Statutes. Our official name is Catholic Research Resources Alliance, Inc. although our name on the website and correspondence will continue to be Catholic Research Resources Alliance, or more simply, CRRA.

The Board is working to complete the application for federal (U.S.) tax-exempt status, more popularly known as 501c3 status. During the 27 month grace period allowed between the time of nonprofit incorporation and submission of the application for federal tax-exempt status, CRRA is able to operate as a nonprofit organization. We wish to complete the application, however, this year and will keep you informed of progress. I look forward to providing an update at the All Member Meeting on July 2.

FROM THE MEMBERSHIP COMMITTEE
Evelyn Minick, Chair
We are pleased to welcome Mount St. Mary’s University (Emmitsburg, MD) to membership and participation in the CRRA. Founded in 1808, Mount St. Mary’s University is the second oldest Roman Catholic university in the United States (the oldest is Georgetown University) and houses the largest Catholic seminary in the U.S.

The Mount has important holdings related to Portal themes and collections, including Catholic pamphlets and newspapers, with plans to digitize rare books and pamphlets. The Archives holds not only the expected college and seminary history but also unique resources relating to the religious formation of alumni as seen from the Revolutionary and Civil Wars, through Desert Storm. Charles Kuhn, Dean, Phillips Library, is excited at the opportunity to share their resources, believing that membership in CRRA will increase awareness of these resources and inspire more scholarship in the field of Catholic Studies both at the Mount and abroad.

More information about the Monsignor Hugh J. Phillips Library and its collections is available at http://www.msmary.edu/academics/library/,  the web address for digital library collections in the repository is http://libguides.msmary.edu/content.php?pid=248426&sid=3059460

MEMBER NEWS
-Congratulations to Joe Lucia, University Library and Falvey Memorial Library Director, Villanova University, on his new responsibilities as Dean of Temple University Libraries, Philadelphia. We will miss his knowledgeable voice on the CRRA Board of Directors and wish him well in his new role.
-Congratulations to Morgen MacIntosh Hodgetts on the publication of ”Religious Archives and Shifting Demographics: The Solution of the Vincentians and DePaul University” in the recent issue of Catholic Library World (March 2013). We are excited to know too that Morgen and the staff of Special Collections at DePaul are describing their rich Catholic Social Justice collections and Vincentian materials in the DeAndreis-Rosati Memorial Archives for the Portal.
-Congratulations to Maria Mazzenga for the NCR article University Archivist Works to Make Catholic History Just a Click Away
Maria Mazzenga, Education Archivist at the Catholic University of America (CUA) and Member, CRRA Collections Committee was featured in the National Catholic Reporter. The article describes Maria’s efforts to get students excited about and using primary sources at CUA. Also included is an overview of CUA collections, with an emphasis on digital collections. Read the full story here: http://ncronline.org/news/people/university-archivist-works-make-catholic-history-just-click-away
________________________________________
CRRA in the News

Read about CRRA in Copley Connects, the University of San Diego library newsletter. Our sincere thanks to Diane Maher, University Archivist and Special Collections Librarian, for capturing highlights of the CRRA 2012 Symposium in words and pictures: http://www.sandiego.edu/documents/library/copley_connects_spring_2013.pdf
_____________________________________
FEATURE ARTICLE
Indexing and Displaying Encoded Archival Description (EAD) Files in the Catholic Portal

In collaboration with the Digital Access Committee (DAC), Eric Lease Morgan (Notre Dame) has re-indexed EAD files for the Catholic Portal resulting in improved search results and displays.

The previous process for indexing EAD files in the “Catholic Portal” was viewed as causing more problems than offering solutions. In short, too many search results were being returned. Those results, while unique, were too ambiguous and too similar in nature to be useful. Moreover, the previous indexing process did not take advantage of an EAD file’s rich metadata — title, date, language, controlled vocabulary terms, biographical history, abstract, scope content notes, etc.

The solution to the problem involved re-indexing portal EAD files and is described by Eric on the CRRA blog. Included in the posts is detailed information about mapping EAD elements to the VuFind/Solr index, useful information for understanding how your EAD descriptions will be searched in the portal. This re-indexing has solved the problem of too many records and has resulted in the display of more of EAD’s rich metadata, specifically the abstract, scope content, biographical history, and physical description. See Eric’s blog posts at: http://www.catholicresearch.net/blog/2012/07/indexing-ead-2/ and http://www.catholicresearch.net/blog/2012/10/indexing-again/ for the full analysis.

The display and search results of EAD files are a great improvement and we encourage you to have a look and let us know what you think. EAD records in the portal are denoted by the Format term “Archival material” and this link takes you to a list of the 238 EAD files currently in the portal:http://www.catholicresearch.net/vufind/Search/Results?lookfor=&type=AllFields&filter[]=format%3A%22Archival+material%22&view=list.
Sincere thanks to Eric and the DAC for this important portal enhancement.

________________________________________
FROM THE CRRA
Jennifer Younger, Pat Lawton

-ANNUAL PLAN Update
Together with Janice Welburn and Tyrone Cannon, chair and vice-chair respectively of the Board , we met with the CRRA committee chairs to begin planning for next year’s goals. The chairs briefly noted some of the most significant accomplishments of this year, including the redesigned website, a more explanatory collection policy, digitizing priorities and the Catholic Newspapers Online, and with great appreciation to you the members, the addition of portal records with links to digital content.

Collectively, they identified priorities for next year including the Catholic Newspaper Program, outreach to scholars and students, member mentoring and orientation, building shared collections of interest, and social media interaction. In May, the CRRA committees will discuss and identify their goals. The draft strategic plan for 2013/14 will be shared with the full membership and discussed at the July 2 All Member Meeting.
We invite your input and suggestions at any time. You can write to any of the committee chairs directly or to Pat or Jennifer.

-On Our Way — 69% of CRRA Members have records in the Catholic Portal
We are pleased to report that 69% of our member institutions have contributed records to the Portal, just 31% shy of our goal for 100% participation. Member holdings represent a wide variety of formats including archival materials, pamphlets, journals, audio materials, newspapers, and electronic resources. Clicking on this link retrieves all records currently in the portal. To narrow your search, select facets (such as institution, format, genre, era, and more) on the right side bar.
If you are interested in adding content, or have any questions regarding the process, please contact Eric Morgan or Pat Lawton.

________________________________________
CRRA COMMITTEE UPDATES
-FROM THE COLLECTIONS COMMITTEE
Diane Maher, Chair
A couple of months ago, Marta Deyrup (Seton Hall University) wrote to CRRA liaisons and archivists in religious communities to invite participation in the online exhibit “Treasures from the Catholic Research Resources Alliance: Women Religious.” The purpose of this exhibition is to provide a glimpse into the diverse lives of women religious, and to invite people to get to know these women through images that illustrate their usual activities. While the exhibit will be hosted at Seton Hall University, when completed, we will also feature this online exhibit on the CRRA home page (www.catholicresearch.net), where it can be viewed in the context of other relevant collections. Watch the home page this fall.

-FROM THE CATHOLIC NEWSPAPERS TASK FORCE
Noel McFerran, Chair
Last summer, the task force initiated a pilot program to identify Catholic newspaper titles held in the U.S. and Canada. Staff at the University of St. Michael’s College in the University of Toronto and at the University of Notre Dame together identified nearly 1,000 titles. From this pilot, we gained knowledge of where to locate Catholic newspapers and the vagaries of newspaper metadata. With this general sensibility to guide us, we are currently investigating software options for the Catholic Newspapers Directory so we may share this data.

At the same time, we are embarking on a project to identify CRRA member Catholic newspaper holdings. We are pleased (and grateful) to report that Marquette University, under the guidance of Amy Cooper Cary (Task Force member), Rose Fortier, and Scott Mandernack will conduct a Survey of Member Holdings. They are currently working with a small group of institutions to determine how best to gather the data, and plan to launch the Member Survey of Holdings this summer. Please help by your participation in the survey, we look forward to having our member holdings well represented in the Directory.

________________________________________
PORTAL TECH TIPS
VuFind Tip #1: Adding Comments
Demian Katz, chair, Digital Access Committee
The Catholic Portal has several useful social features you can take advantage of while logged in. If you do not already have an account, you can set one up using the Login / Create account link found near the top of any Catholic Portal page.

If you would like to share your thoughts about a record in the portal, you can use the built-in comments feature. View the record for the item you wish to comment on. You will see a tab bar below the basic description of the item. Click on the Comments tab. You can now post a comment and read those shared by other users.

Watch for more tips in future issues and please let us know what others you may find helpful.
__________________________________
UPCOMING EVENTS
Catholic Legacies in Victoria (May 28-29, 2013)

Join colleagues for two days of seminars, lectures and exhibits that bring to vivid life the stories of Roman Catholic missionaries during the period of Victoria’s colonial settlement, including the extraordinary collection of “bishop’s books,” a collection of 3,500 rare editions from the Renaissance to the 20th century and hear the stories of Roman Catholic missionaries during the period of Victoria’s colonial settlement. Full details are on the web at http://csrs.uvic.ca/events/seminars_conferences/community_seminar.php.
(Announcement courtesy of Jonathan Bengtson, formerly of the University of St. Michael’s College.)
________________________________________
CRRA Update is an electronic newsletter distributed via email to provide members with an update of CRRA activities. Please contact Pat at 574.631.1324 or email plawton@nd.edu with your questions, comments, or news to share.

by plawton at May 02, 2013 02:43 PM

April 29, 2013

DH @ Notre Dame

Matt Sag and copyright

Eric, Matt, and Matt

Eric, Matt, and Matt

Matt Sag (Loyola University Chicago) came to visit Notre Dame on Friday, April 12 (2013). His talk was on copyright and the digital humanities. In his words, “I will explain how practices such as text mining present a fundamental challenge to our understanding of copyright law and what this means for scholars in the digital humanities.”

The presentation was well-attended, and here a few of my personal take-aways:

  • Sag enumerated a number of technologies for presenting media (photographs, phonographs, radios, photocopiers, televisions, tape recorders, etc.), and then he said, “Just about all new technologies required a re-thinking of the ideas of copyright…” This is/was interesting because I imagined how the copyright laws have changed along with the advent to new devices, but…
  • A popular phrase used to describe the way digital humanists investigate content is “non-consumptive”, meaning the results do not really use up the resource. Sag prefers a different phrase — “non-expressive”.
  • He went on to say, “…but copyright did not really change.” Furthermore, “The ‘non-expressive’ use of content is not really copyrightable. Or is it?”
  • To answer his own question, Sag does not believe processes like text mining violate copyright because the results are generated automatically — created by machines. The results are algorithmically determined and are not dissimilar to the way Internet search engines work. Copyright claims against search engines have not stood up in court. Maybe it could be put this way. Text mining is an automated process similar to Internet search engine indexing. Internet search engine indexing has not been determined to be in violation of copyright. Therefore text mining is not in violation of copyright either. (A equals B. B is not C. Therefore A is not C either.)

Okay. So this particular mini-travelogue may not be one of my greatest, but Sag was a good speaker, and a greater number of people than usual came up to me after the event expressing their appreciation to hear him share his ideas. Matt Sag, thank you!

by Eric Lease Morgan at April 29, 2013 03:37 PM

April 12, 2013

Life of a Librarian

Catholic pamphlets workflow

Gratuitous eye candy by Matisse

Gratuitous eye candy by Matisse

This is an outline of how we here at Notre Dame have been making digitized versions of our Catholic pamphlets available on the Web — a workflow:

  1. Save PDF files to a common file system – This can be as simple as a shared hard disk or removable media.
  2. Ingest PDF files into Fedora to generate URLs – The PDF files are saved in Fedora for the long haul.
  3. Create persistent URLs and return a list of system numbers and… URLs – Each PDF file is given a PURL for the long haul. Output a delimited file containing system numbers in one column and PURLs in another. Create persistent URLs and return a list of system numbers and… URLs – Each PDF file is given a PURL for the long haul. Output a delimited file containing system numbers in one column and PURLs in another. (Steps #2 and #3 are implemented with a number of Ruby scripts: batch_ingester.rb, book.rb, mint_purl.rb, purl_config.rb, purl.rb, repo_object.rb.)
  4. Update Filemaker database with URLs for quality assurance purposes – Use the PURLs from the previous step and update the local database so we can check the digitization process.
  5. Start quality assurance process and cook until done – Look at each PDF file making sure it has been digitized correctly and thoroughly. Return poorly digitized items back to the digitization process.
  6. Use system numbers to extract MARC records from Aleph – The file names of each original PDF document should be an Aleph system number. Use the list of numbers to get the associated bibliographic data from the integrated library system.
  7. Edit MARC records to include copyright information and URLs to PDF file – Update the bibliographic records using scripts called list-copyright.pl and update-marc.pl. The first script outputs a list of copyright information that is used as input for the second script which includes the copyright information as well as simply pointers to the PDF documents.
  8. Duplicate MARC records and edit them to create electronic resource records – Much of this work is done using MARCEdit
  9. Put newly edited records into Aleph test – Ingest the newly created records into a staging area.
  10. Check records for correctness – Given enough eyes, all bugs are shallow.
  11. Put newly edited records into Aleph production – Make the newly created records available to the public.
  12. Extract newly created MARC records with new system numbers – These numbers are needed for the concordance program — a way to link back from the concordance to the full bibliographic record.
  13. Update concordance database and texts – Use something like pdftotext to extract the OCR from the scanned PDF documents. Save the text files in a place where the concordance program can find them. Update the concordance’s database linking keys to bibliographic information as well as locations of the text files. All of this is done with a script called extract.pl.
  14. Create Aleph Sequential File to add concordance links – This script (marc2aleph.pl) will output something that can be used to update the bibliographic records with concordance URLs — an Aleph Sequential File.
  15. Run Sequential File to update MARC records with concordance link – This updates the bibliographic information accordingly.

Done, but I’m sure your milage will vary.

by Eric Lease Morgan at April 12, 2013 03:15 PM

April 10, 2013

DH @ Notre Dame

Copyright And The Digital Humanities

This Friday (April 12) the Notre Dame Digital Humanities group will be sponsoring a lunchtime presentation by Matthew Sag called Copyright And The Digital Humanities:

I will explain how practices such as text mining present a fundamental challenge to our understanding of copyright law and what this means for scholars in the digital humanities.

Matthew Sag is a faculty member at the law school at Loyola University Chicago. [1, 2] If you would like to attend, then please drop me (Eric Lease Morgan <emorgan@nd.edu>) a note so I can better plan. Free food.

  • Who: Matthew Sag
  • What: A talk and discussion on copyright and humanities research
  • Where: LaFortune Gold Room (3rd floor)
  • When: Friday, April 12, 11:45 am – 1:00 pm

[1] Sag’s personal Web page
[2] Sag’s professional Web page

by Eric Lease Morgan at April 10, 2013 06:13 PM

April 04, 2013

Life of a Librarian

Digital Scholarship Grilled Cheese Lunch

Grilled Cheese Lunch Attendees

Grilled Cheese Lunch Attendees

In the Fall the Libraries will be opening a thing tentatively called The Hesburgh Center for Digital Scholarship. The purpose of the Center will be to facilitate learning, teaching, and research across campus through the use of digital technology.

For the past few months I have been visiting other centers across campus in order to learn what they do, and how we can work collaboratively with them. These centers included the Center for Social Research, the Center for Creative Computing, the Center for Research Computing, the Kaneb Center, Academic Technologies, as well as a number of computer lab/classroom. Since we all have more things in common than differences, I recently tried to build a bit of community through a grilled cheese lunch. The event was an unqualified success, and pictured are some of the attendees.

Fun with conversation and food.

by Eric Lease Morgan at April 04, 2013 08:41 PM

March 31, 2013

Catholic Portal

WordPress Finesse

Greetings CRRA community,

We have been working on updating the blog and are in the process of adding new information in an effort to make it more user-friendly.  With this in mind, we felt it might be helpful to share our findings, what we learned about WordPress, with you, as you may find the information useful now or at some point in the future.  Feel free to share this information with colleagues, family and friends.

In this posting, we will present information on the following topics:

  1. Adding the “more tag” in blog posts
  2. Creating and displaying categories for blog posts
  3. Antedating blog posts
  4. Formatting a blog’s side bar

As an aside, please note that all the information pertains to blogs operated using the WordPress system. 

Adding the “More Tag”

The “more tag” is a very useful tool in blog posts.  Using the “more tag” will help to make your blog page look cleaner and sleeker and allow your readers to easily access the information they need.  Additionally, the “more tag,” allows users to quickly browse the most recent posts.

WordPress defines the “more tag” as a “tag that breaks a post into ‘teaser’ and content sections. Type a few paragraphs, insert this tag and then compose the rest of your post. On your blog’s home page you will see only those first paragraphs with a hyperlink ((more…)), which when followed displays the rest of the post’s content.”  The “more tag” is considered a QuicktagQuicktags are displayed in the tool bar at the top of the draft of your new post.  The “more tag” can be inserted at any point in your post.  Also, you can go back and edit previous posts and insert the “more tag” wherever it is most useful and then update the piece.

An example has been provided here.  Click this link and it will lead you to the rest of the content. 

For additional information, consult these links:

  1. http://codex.wordpress.org/Customizing_the_Read_More#Designing_the_More_Tag
  2. http://codex.wordpress.org/Write_Post_SubPanel#Quicktags

Categories for Blog Posts

Categorizing your blog posts is an easy process and it will help your readers to search for the information they want/need.  Follow these steps to create categories for your posts:

  1. On your blog home-page, locate the “Post” category on the left hand side menu.  In that menu, there is an option for “categories.”  Select “categories.”
  2. On the page that appears, you will be given the option to “Add New Category.”
  3. Determine the name for the category and type it in the selected field.
  4. You can then select a “Parent” category, should you desire that the categories have a hierarchy.
  5. Finally, you have the opportunity to provide a description for the category.  Add the description if you feel the category name needs further clarification.
  6. Once all these fields have been created, click the “Add New Category” button at the bottom of the page.  The new category will then appear in the categories list on the right hand side of the page.

With your category names created, you can then begin to categorize your posts.  Once you have composed a new post and are ready to publish the post, look on the right hand side of the “Add New Post” page and there will be a section titled “Categories.”  There, you can select the appropriate category for the post.  If you desire, you can publish the post in multiple categories.  Once this step is completed, you are ready to publish your new, categorized post!

Consult the following link for additional information:

  1. http://www.siteground.com/tutorials/wordpress/wordpress_category.htm

 Antedating Blog Posts

On occasion, you may wish to add something additional to your blog that corresponds to a previous post or create a post about something that happened months or years in the past. Blog posts are antedated in an effort to archive older information or past happenings in their appropriate sequence.  For example, if you would like to include a blog post for an event that happened in September 2012, but do not antedate the post, it will appear on the front page of the blog, as though it happened recently in March 2013.

WordPress automatically assigns blog posts the date of which the post is created.   However, backdating the post is very easy in WordPress.  Once you have created a new post and added your content, there is a section titled “Publish” on the right hand side of the “Add New Post” page.  In this section, publishing information is listed.  Likely, you will see “Publish immediately” listed.  There is an edit option next to this statement.  Select the edit option and you will be able to either future or backdate the publication.  Hence, it is easy to add information from earlier months or years.  Or, you can work ahead and add information that you would like to appear on the blog in the future.

Formatting a Blog’s Main Sidebar

The sidebar of the blog contains essential information that can help the site visitors navigate the blog.  The information on the sidebar can be changed and updated at any point.  In order to make changes, click the “appearance” tab on the left hand side of the Dashboard page.  Under the “appearance” tab, select the “widgets” option.  The widgets, which “add content and features to your sidebars,” can be edited or changed at any point.  Some possible widget options include: search, categories, navigation, etc.  In order to activate a widget, simply drag the desired widget from the section titled “Available widgets” on the left-hand side of the page to the desired sidebar location on the right-hand side of the page.  You can place widgets in the main sidebar section or at the footer or header of your blog page.   WordPress also conveniently shows all the widgets that are currently unused at the bottom of the page.  Feel free to experiment with different widgets to see which options work best for you and your readers.

We found this link to be helpful: http://codex.wordpress.org/WordPress_Widgets

We hope this information has been helpful.  We have learned more about blog management in this process and look forward to the opportunity to continue to update the blog and make it more user-friendly.  Any suggestions or comments about our changes or ideas for further changes are most welcome.

Thank you!  Happy Spring!

 

by plawton at March 31, 2013 11:17 PM

March 08, 2013

Life of a Librarian

Editors across campus: A reverse travelogue

Some attending editors

Some attending editors

On Friday, February 8 an ad hoc library group called The Willing sponsored a lunch for editors of serial titles from across campus, and this is all but the tiniest of “reverse travelogues” documenting this experience surrounding the scholarly communications process.

Professionally, I began to experience changes in the scholarly communications process almost twenty years ago when I learned how the cost of academic journals was increasing by as much as 5%-7% per year every year. With the advent of globally networked computers, the scholarly communications process is now effecting academics more directly.

In an effort to raise the awareness of the issues and provide a forum for discussing them, The Willing first compiled a list of academic journals whose editors were employed by the University. There are/were about sixty journals. Being good librarians, we subdivided these journals into smaller piles based on various characteristics. We then invited subsets of the journal editors to a lunch to discuss common problems and solutions.

The lunch was attended by sixteen people, and they were from all over the campus wearing the widest variety of hats. Humanists, scientists, and social scientists. Undergraduate students, junior faculty, staff, senior faculty. Each of us, including myself, had a lot to say about our individual experiences. We barely got around the room with our introductions in the allotted hour. Despite this fact, a number of common themes — listed below in more or less priority order — became readily apparent:

  • facilitating the peer-review process
  • going digital
  • understanding open access publishing models
  • garnering University support
  • balancing copyrights (often called “ownership” by attendees)
  • being financially sustainable
  • combatting plagiarism
  • facilitating community building around and commenting on journal content
  • soliciting submissions

With such a wide variety of topics it was difficult to have a focused discussion on any one of them in the given time and allow everybody to express their most important concerns. Consequently it was decided by the group to select individual themes and sponsor additional get togethers whose purpose will be to discuss the selected theme and only the selected theme. We will see what we can do.

Appreciation goes to The Willing (Kenneth Kinslow, Parker Ladwig, Collette Mak, Cheryl Smith, Lisa Welty, Marsha Stevenson, and myself) as well as all the attending editors. “Thanks! It could not have happened without you.

by Eric Lease Morgan at March 08, 2013 05:58 PM

March 05, 2013

DH @ Notre Dame

Digital humanities and the liberal arts

Galileo demonstrates the telescope

Galileo demonstrates the telescope

The abundance of freely available full text combined with ubiquitous desktop and cloud computing provide a means to inquire on the human condition in ways not possible previously. Such an environment offers a huge number of opportunities for libraries and liberal arts colleges.

Much of the knowledge created by humankind is manifest in the arts — literature, music, sculpture, architecture, painting, etc. Once the things of the arts are digitized it is possible to analyse them in the same way physical scientists analyze the natural world. This analysis almost always takes the shape of measurement. Earthquakes have a measurable magnitude and geographic location. Atomic elements have a measurable charge and behave in predicable ways. With the use of computers, Picasso’s paintings can be characterized by color, and Shakespeare’s plays can be classified according to genre. The arts can be analyzed similarly, but this type of analysis is in no way a predeterminer of truth nor meaning. They are only measurements and observations.

Libraries and other cultural heritage institutions — the homes for many of artistic artifacts — can play a central role in the application of the digital humanities. None of it happens without digitization. This is the first step. The next step is the amalgamation and assimilation of basic digital humanities tools so they can be used by students, instructors, and researchers for learning, teaching, and scholarship. This means libraries and cultural heritage institutions will need to go beyond basic services like find and get; they will want to move to other things such as annotate, visualize, compare & contrast, etc.

This proposed presentation elaborates on the ideas outlined above, and demonstrates some of them through the following investigations:

Digital humanities simply applies computing techniques to the liberal arts. Their use is similar to use of the magnifying glass by Galileo. Instead of turning it down to count the number of fibers in a cloth (or to write an email message), it is being turned up to gaze at the stars (or to analyze the human condition). What he finds there is not so much truth as much as new ways to observe. Digital humanities computing techniques hold similar promises for students, instructors, and scholars of the liberal arts.

by Eric Lease Morgan at March 05, 2013 04:01 PM

March 04, 2013

DH @ Notre Dame

Introduction to text mining

Starry Night by Van Gogh

Starry Night by Van Gogh

Text mining is a process for analyzing textual information. It can be used to find both patterns and anomalies in a corpus of one or more documents. Sometimes this process is called “distant reading”. It is very important to understand that this process is akin to a measuring device and should not used to make value judgements regarding a corpus. Computers excel at counting (measuring), which is why they is used in this context. Value judgements — evaluations — are best done by humans.

Text mining starts with counting words. Feed a computer program a text document. The program parses the document into words (tokens), and a simple length of the document can be determined. Relatively speaking, is the document a long work or a short work? After the words are counted, they can be tabulated to determine frequencies. One set of words occurs more frequently than others. Some words only occur once. Documents with a relatively high number of unique words are usually considered more difficult to read.

The positions of words in a document can also be calculated. Where in a document are particular words used? Towards the beginning? The middle? The end? In the form of a histogram, plotting frequencies of words against relative positions can highlight the introduction, discussion, and conclusion of themes. Plotting multiple words on the same histogram — whether they be synonyms or antonyms — may literally illustrate ways they are used in conjunction. Or not? If a single word holds some meaning, then do pairs of words hold twice as much meaning? The answer is, “Sometimes”. Phrases (n-grams) are easy to count and tabulate once the postions of words are determined, and since meaning is often not determined solely in single words but multi-word phrases, n-grams are interesting to observe.

Each human language adheres to a set of rules and conventions. If they didn’t, then no body would be able to understand anybody else. A written language has syntax and semantics. Such rules in the English language include: all sentences start with a capital letter and end with a defined set of punctuation marks. Proper nouns begin with capital letters. Gerunds end in “ing”, and adverbs end in “ly”. Furthermore, we know certain words carry gender connotations or connotations regarding singularity or plurality. Given these rules (which are not necessarily hard and fast) it is possible to write computer programs do to some analysis. This is called natural language processing. Is this book more or less male or female? Are there many people in the book? Where does it take place? Over what time periods? Is the text full of action verbs or are things rather passive? What parts-of-speech predominate the text or corpus?

All of the examples from the preceding paragraphs describe the beginnings of text mining in the digital humanities. There are many Web-based applications allowing one to do some of this analysis, and there are many others that are not Web-based, but there are few, if any, doing everything the scholar will want to do. That is the definition of scholarship. Correct? Most digital humanities investigations will require team efforts — the combined skills of many different people: domain experts, computer programmers, graphic designers, etc.

The following links point to directories of digital humanities tools. Browse the content of the links to get and idea of what sorts of things can be done relatively quickly and easily:

In the following is a link to a particular digital humanities tool. Also included are a few links making up a tiny corpus. Use the tool to do some evaluation against the texts. What sort of observations are you able to discern using the tool? Based on those observations, what else might you want to discover? Are you able to make any valid judgments about the texts or about the corpus as a whole?

Use some of your own links — build your own corpus — to do some of analysis from your own domain. What new things did you learn? What things did you know previously that were brought to light quickly? Would a novice in your domain be able to see these things as quickly as you?

Text mining is a perfect blend between the humanities and the sciences. It epitomizes a bridge between the two cultures of C. P. Snow. [1] Science does not explain. Instead it merely observes, describes, and predicts. Moreover, it does this in a way that can be verified and repeated by others. Through the use of a computer, text mining offers the same observation processes to the humanist. In the end text mining — and other digital humanities endeavors — can provide an additional means for accomplishing the goals of the humanities scholar — to describe, predict, and ultimately understand the human condition.

The digital humanities simply apply computing techniques to the liberal arts. Their use is similar to use of the magnifying glass by Galileo. Instead of turning it down to count the number of fibers in a cloth (or to write an email message), it is being turned up to gaze at the stars (or to analyze the human condition). What he finds there is not so much truth as much as new ways to observe.

[1] Snow, C. P., 1963. The two cultures ; and, A second look. New York: New American Library.

by Eric Lease Morgan at March 04, 2013 07:45 PM

February 27, 2013

Catholic Portal

CRRA Update Jan/Feb 2013

This month’s update includes:

  • From the Board: Janice Welburn, chair, on Becoming a nonprofit federally tax-exempt corporation
  • Member News:  Villanova wins ACRL Excellence in Libraries Award!; Congratulations to Carol Johnson, Bob Skinner, and Stephanie Clark; CRRA in the Spotlight
  • Feature Article: Learning more about our collection through selecting materials for CRRA, by Lisa Gonzalez, Catholic Theological Union
  • Committee Updates:  From The Collections Committee on Updating the collection policy; the Digital Access Committee on EAD records, the website, and more; and the Newspapers Task Force on the List of Catholic Newspapers Online
  • Grant opportunities: CLIR Hidden Collections proposals due March 22, 2013
  • Upcoming events: The CRRA Annual Meeting (July 2)


FROM THE BOARD
Janice Welburn, Chair, Dean of Libraries, Marquette University

The Board held its third meeting of the 2012/13 year on February 25, 2013. Discussion continued on becoming a nonprofit federally tax-exempt corporation and I am pleased to report that we are on track to accomplish that this year. We appreciated the words of encouragement we received from individual directors, deans and committee members last year. It is a great pleasure to serve with you in advancing our mission of providing global enduring access to Catholic research resources in the Americas.


MEMBER NEWS

Congratulations to Villanova University! – VU Wins ACRL Excellence in Libraries Award
Shout out to Villanova: Congratulations to Library Director Joe Lucia and the Falvey staff for having your outstanding efforts recognized by being named a 2013 ACRL Excellence in Academic Libraries Award winner! Read more.

Thank you and congratulations to Carol Johnson, Bob Skinner and Stephanie Clark
We thank and congratulate Carol Johnson, St. Catherine University (St. Kate’s), and  Bob Skinner, Xavier University of New Orleans, who recently retired from outstanding careers in library administration. We thank and congratulate Stephanie Clark who rekindled her love for public libraries with a move from Georgetown University to Arlington Public Library.  They were great friends and leaders in developing CRRA collections and programs, and we wish them all the best.

CRRA in the spotlight!
Thanks to Scott Walter, University Librarian, DePaul University, for showcasing the CRRA Symposium Nurturing the ‘Spirit of Perfect Charity’: Libraries and Archives at the Intersection of Service and Scholarship in Catholic Social Justice Studies in the January 2013 issue of College & Research Libraries news.  DePaul University hosted the symposium which brought together nearly 100 scholars, librarians and archivists for informative and lively discussions.  Read more.


FEATURE ARTICLE
Learning more about our collection through selecting materials for CRRA
Lisa Gonzalez, Electronic Resources Librarian at Catholic Theological Union (CTU)

So what is rare, unique and uncommon about the collection at Paul Bechtold Library? Since I joined the library, I’ve written several blurbs about our collection to promote our library on our blog and on other places on the internet, so I drew on some of these statements to guide our selections for the Catholic Portal. One thing the statements on our website, our Internet Archive page, various blog postings, and even some library exhibits have in common was our focus on religious orders.

[To view CTU records in the Portal please see: http://www.catholicresearch.net/vufind/Search/Results?lookfor=&type=AllFields&filter[]=institution%3A%22Catholic+Theological+Union%22&view=list]

Thanks to a happy coincidence, we had a trial to WorldCat Collection Analysis in 2011, about the same time as we joined CRRA. I used it to search for items on monasticism that were owned by five libraries or fewer; I was able to identify more than 3,000 items through this process. I added the records from the search to a spreadsheet, to which I continue to add records that fit our other selection criteria. This includes items from our largest collections about religious orders, including the Franciscans and the Passionists.

Digitized items are also included in our CRRA collection, in order to increase access to full text Catholic materials. Once an item is digitized through the state of Illinois’ CARLI book digitization program, a link is added to the local record for the item, and that record is included in our CRRA records. [For example, The Passionist: bulletin of Holy Cross providence]

The records selected for CRRA can reveal information about our collection when viewed as a group through the Catholic Portal. We already knew we had a strong collection on religious orders, but the browse function of the Portal highlights our collection of rules and constitutions from many of them. [http://www.catholicresearch.net/vufind/Search/Results?lookfor=&type=AllFields&filter[]=institution%3A%22Catholic+Theological+Union%22&filter[]=genre_facet%3A%22Rules%22&view=list]

I’ve also begun using the Portal to help select potential candidates for digitization; one criteria for digitization is that the item is not available in a mass digitization project, and links to Google and Open Library in the Portal make it easy to see items that are already digitized. Reviewing our records in the Portal can also help me identify themes for potential digitization proposals, since a theme is necessary for CARLI proposals. In the future, we plan to propose more digitization projects focused on men’s religious orders as well as on Catholic publishing in the United States, and we plan to add these records to our Catholic Portal collection.


COMMITTEE UPDATES

-FROM THE COLLECTIONS COMMITTEE
Diane Maher, Chair

Catholic Portal Collection Policy Statement: Under Construction
The Collection Committee is in the process of updating the Portal’s collection policy statement. This revision seeks to enhance the document’s usability for members. Collecting themes have been defined and the committee is now assigning subject heading examples for each theme. Guidelines for contributing published print materials will also be added to help clarify the decision making process for members. This is the first revision since the document was created at CRRA’s inception. It is a tribute to the policy’s statement that its original recommendations remain vital to the Portal’s collection development.

-FROM THE DIGITAL ACCESS COMMITTEE
Demian Katz, Chair

The membership of the Digital Access Committee has changed over the past months, with the departure of Michael Bramah (St. Mike’s, University of Toronto) and Vani Murthy (Georgetown) and the addition of three new members: Tracy Jackson (Seton Hall), Shana McDanold (Georgetown) and Megan Bernal (DePaul).  We thank Michael and Vani for their good service and wise counsel. It has been a joy to work with you.  More information on the current members of the committee can be found here.

DAC has been focusing for the past few months on improving CRRA’s web presence by rebuilding the catholicresearch.net site using the open source Concrete5 content management system.  The new web design, which was recently unveiled, combines more complete information about CRRA with a more attractive visual presentation.  The Concrete5 system makes updating the site and collaborating on changes easy, so the site will be able to evolve to meet the organization’s changing needs over time, and interested members can collaborate with DAC to facilitate updates.

Another ongoing project has been improving the representation of EAD finding aids in the Catholic Portal.  An improved indexing routine was recently implemented which should make archival material more usable within the Portal.  Read more.

Over the coming year, DAC plans to continue improving the website, expand Portal content and links to digital content held by members in other repositories, and investigate new technologies that can support the mission of CRRA.

-FROM THE CATHOLIC NEWSPAPERS TASK FORCE
Noel McFerran, Chair

The Task Force is happy to announce the “List of Catholic Newspapers Online.”   Thanks to individuals and institutions across North America, the list now includes some 40 titles with links to digitized or born-digital Catholic newspapers in the US and Canada.  For example,

Arkansas Catholic, formerly The Southern Guardian/The Guardian (Little Rock, AR)
1911 – 1926, 1945 (1927 – 1931 soon to be added)
http://arc.stparchive.com/

The Catholic (Kingston, ON)
1830 – 1844 (missing some issues)

http://eco.canadiana.ca/view/oocihm.8_04110

The Catholic Commentator - Diocese of Baton Rouge (Louisiana)
The newspaper for the Diocese of Baton Rouge is The Catholic Commentator.  The first issue was published February 8, 1963.  It was published weekly until March of 1984; since then new issues have been published every other week.  Physical copies of the newspaper are held by both the Archives Department and The Catholic Commentator, which are both located in the administration offices of the diocese in Baton Rouge, LA.  Digital copies of the Commentator from 2007 forward are available on their website: http://thecatholiccommentator.org.

The CRRA continues its work to provide access to all extant Catholic newspapers in North America through building a directory of North American Catholic newspapers and encouraging digitization of Catholic newspapers; The Online List furthers this goal by providing immediate value and use.

Please help us to populate this list!  If you know of Catholic newspapers that are available in digital form that are not yet on our list, kindly pass it on. More on how you can help.


GRANT OPPORTUNITIES OF POSSIBLE INTEREST
Hidden Collections Application Period Opens: The application period for the 2013 Cataloging Hidden Special Collections and Archives program is now open. Applications are due March 22, 2013. For more information, visit: http://www.clir.org/fellowships/hiddencollections.  If interested but not able to apply this year, consider using the guidelines to start developing a proposal for next year. Please feel free to call on Pat or Jennifer if you would like help in finding other CRRA members who might be interested in developing a shared proposal with you. Marquette, Catholic and St. Catherine point to their shared project as one factor in their successful application.


SAVE THE DATE

We will hold our CRRA All Member Meeting on Tuesday morning, July 2, 2013 in Chicago.  Details will be posted here and to the CRRA News and Events page http://www.catholicresearch.net/cms/index.php/crra-news-and-events/ as they become available.


CRRA Update is an electronic newsletter distributed via email to provide members with an update of CRRA activities.  Please contact Pat at 574.631.1324 or email plawton@nd.edu with your questions, comments, or news to share.

by plawton at February 27, 2013 07:39 PM

January 29, 2013

DH @ Notre Dame

Genderizing names

I was wondering what percentage of subscribers to the Code4Lib mailing list were male and female, and consequently I wrote a hack. This posting describes it — the hack that is, genderizing names.

I own/moderate a mailing list called Code4Lib. The purpose of the list is provide a forum for the discussion of computers in libraries. It started out as a place to discuss computer programming, but it has evolved into a community surrounding the use of computers in libraries in general. I am also interested in digital humanities computing techniques, and I got to wondering whether or not I could figure out the degree the list is populated by men and women. To answer this question, I:

  1. extracted a list of all subscribers
  2. removed everything from the list except the names
  3. changed the case of all the letters to lower case
  4. parsed out the first word of the name and assumed it was a given name
  5. tabulated (counted) the number of times that name occurred in the list
  6. queried a Web Service called Gendered Names to determine… gender
  7. tabulated the results
  8. output the tabulated genders
  9. output the tabulate names
  10. used the tabulated genders to create pie chart
  11. used the tabulated names to create a word cloud

In my opinion, the results were not conclusive. About a third of the names are “ungenderizable” because no name was supplied by a mailing list subscriber or the Gendered Names service was not able to determine gender. That aside, most of the genderized names are male (41%) and just over a quarter (26%) of the names are female. See the chart:

pie-chart

To illustrate how the names are represented in the subscriber base, I also created a word cloud. The cloud does not include the “no named” people, the unknown genders, nor the names where there was only one occurrence. (The later have been removed to protect the innocent.) Here is the word cloud:

word-cloud

While I do not feel comfortable giving away the original raw data, I am able to make available the script used to do these calculations as well as the script’s output:

  • names.pl – Perl script that does the tabulations
  • data.txt – the aggregated results (output) of the script

What did I learn? My understanding of the power of counting was re-enforced. I learned about a Web Service called Gendered Names. (“Thank you, Misty De Meo!”). And I learned a bit about the make-up of the Code4Lib mailing list, but not much.

by Eric Lease Morgan at January 29, 2013 09:24 PM

January 28, 2013

Catholic Portal

The Catholic Pamphlets Collection

The Catholic Pamphlets collection at the University of Notre Dame contains a wide variety of pamphlets, booklets, and other documents pertaining to Catholicism or the Church in some way. The collection is located at the Department of Rare Books and Special Collections at Hesburgh Library, and is currently undergoing digitization. These publications were intended to educate a particular audience regarding issues relevant to the Church. While the University of Notre Dame’s collection does not include any entries before the nineteenth century, there is evidence of similar pamphlets being published at least since the time of the Reformation in Europe. With the invention of the printing press, pamphlets became a convenient means to disseminate ideas to a wide audience. Such publications were produced by both Protestant and Catholic sources in an attempt to influence readers with respect to religious and social issues (Edwards).

Example Pamphlet

In the Catholic Pamphlets collection, the publication dates range from 1823 to 2008, but most of them were published in the twentieth century, particularly between 1930 and 1959. Of the 5126 entries in the catalog, 152 were published before 1900, 3719 were published between 1900 and 1999 inclusive, 26 were published in 2000 or later, and 1229 have an unknown or approximate year listed.

There are 1295 unique publisher entries, of which the one with the most pamphlets in the collection is the Paulist Press, with 355 entries, followed by the National Council of Catholic Men (220 entries), Queen’s Work (209 entries), s.n. [none listed] (193 entries), the Catholic Truth Society (156 entries), and Our Sunday Visitor (154 entries).

Most common words in titles

Most common words in subject headings

Most common words in subject headings (minus “Catholic” and “Church”)

The pamphlets are mostly in English, but there are entries from a variety of languages in the collection, including Latin, French, Polish, Spanish, Chinese, and more. Despite the name “Catholic Pamphlets,” not all of the documents are “pamphlets” in a strict sense: the documents in the collection range in size from single pages to small books of one hundred to two hundred pages. The United Nations Educational, Scientific and Cultural Organization, for example, uses the following definition for a pamphlet in its publication, “International Standardization of Statistics Relating to Book Production and Periodicals:”

“A pamphlet is a non-periodical printed publication of at least 5 but not more than 48 pages, exclusive of the cover pages, published in a particular country and made available to the public.”

This definition should be helpful as a point of comparison. Using this definition, there are entries that might not be considered “pamphlets,” such as “Courtship and Marriage,” which is 136 pages, and “Suggested Constitution of the Confraternity of Christian Doctrine for Parish Units Affiliated with the Diocesan Confraternity,” which is only 4 pages.

Examples of entries not meeting the UNESCO definition of a “pamphlet”

There is a wide range of topics discussed in these pamphlets, which can be dependent on the time in which they were published. For example, during the early to mid-twentieth century there were a fair number of pamphlets published concerning communism and the views of the Church and individual Catholics concerning it, such as “Communism: Threat to Freedom” [1962], “Facts about Communism” [1937], and “Just What is Communism?” [1935]. Other pamphlets are instructional in nature, teaching about the sacraments or the mass. These materials include pamphlets targeted toward Catholics such as “Preparing Your Child for the Sacraments” [1965] as well as pamphlets targeted toward non-Catholics who may be unfamiliar with various aspects of Catholicism, such as “Catholic’s Ready Reply; Thirty-nine Answers to the Thirty-nine Most Frequent Questions Asked by Non-Catholics” [1954]. Some pamphlets concern social and moral issues, such as poverty, alcoholism, and war, while others are simply prayer books, catechisms, instructions for mass, and novenas. Finally, there are some entries which are only tangentially related to Catholicism such as a few collections of comics that were published in Catholic magazines, including “Speck, the Altar Boy” [1958], “Our Little Nuns: A Book of Cartoons Created Exclusively for Extension Magazine” [1954], and “Priests are like People: A Book of Cartoons Created Exclusively for Extension Magazine” [1954].

The pamphlets can be useful in answering a number of questions researchers may have about the Church or Catholicism. For example, one could use the pamphlets to study how has the Church evolved regarding ecumenism and relations toward non-Catholics. A cursory search of the collection reveals titles such as “Is there Salvation Outside the Church?”, “An Interdiocesan Program for Ecumenism: That We May Be One” [1971], “Documents on Anglican/Roman Catholic relations” [1972], and “On Dialogue with Non-Believers: August 28, 1968.” [1968]. Another possible research question that could be answered with the Catholic Pamphlets collection is how views on papal and church authority have changed over the years. A search of titles in the collection for “authority” and “infallibility returns the following entries, among others:

  • “Is the Pope Always Right? Of Papal Infallibility” [1947]
  • “Papal Infallibility” [1925]
  • “Is Papal Infallibility Reasonable? A Divine Safeguard Against Error”
  • “An Agreed Statement on Authority in the Church: Venice, 1976″
  • “The Principle of Authority: Churches and Pastors, the Church, its Authority”
  • “The Obedience of Authority” [1922]
  • “Freedom vs. authority” [1966]
  • “Reflections on conscience and authority” [1964]

Of particular interest to researchers is the concordance feature, currently located at http://concordance.library.nd.edu/app/. This can be used to find the frequency of words in a given document, and can be useful in getting an overview of the themes and topics of that publication. For example, the concordance feature can be used with “Is the Pope Always Right? Of Papal Infallibility” to find the 25 most frequently used words, which are as follows:

church (62); bill (49); €” (36); charlie (32); father (32); catholic (24); can (22); priest (22); faith (21); said (21); infallibility (19); one (18); even (14); papal (14); come (13); say (13); make (12); know (11); god (11); will (10); people (10); think (10); life (10); see (10); work (10);

The concordance feature is fairly flexible in how it can be used. Along with searching for the most frequent words in a document, the concordance can find the most common phrases of x number of words, or the most common words beginning with a certain letter of the alphabet. The concordance feature is not without a few shortcomings, however. As the example above shows, the concordance may include common words like “can” in its search, along with artifacts from the optical character recognition (OCR) process such as the euro currency symbol shown in the third result. Despite these shortcomings, the concordance should still be a rather useful and interesting feature to researchers of the pamphlet collection.

Finally, when using the Catholic Pamphlets collection for research, some things should be kept in mind. In particular, the views presented in a given pamphlet could be those of the Church itself, or it could be just those of a particular individual or group. Care should be taken to avoid giving undue weight to the views of one particular person or organization as being representative. For example, there are several pamphlets by the controversial 1930s radio host, Father Charles Coughlin, and it might not be reasonable to conclude that his views on, for example, labor and economic issues are representative of the Church as a whole, even in the 1930s. Additionally, there could be some greater context that needs to be kept in mind, such as the events of a particular time period. For example, something significant such as World War II or the Second Vatican Council may have influenced the sorts of pamphlets published in their respective time periods. Finally, an existing familiarity with Church history and issues related to Catholicism will help greatly in making use of the Catholic Pamphlets collection for research. The examples given here are based on title keywords that this author knows are associated with a given topic. A title may not always be indicative of a pamphlet’s subject material. There may be other pamphlets about, for example, church authority that do not use the terms “infallibility,” “authority,” or certain other words. A researcher with a better knowledge of church history may have a better ability to search for relevant documents. In spite of the limitations mentioned here, the Catholic Pamphlets collection should still be rather useful for those studying Church history or other aspects of Catholicism.

 

Bibliography

  • Edwards, M. U. (1994). Printing, Propaganda and Martin Luther. Berkeley, CA: University of California Press.
  • Holborn, L. W. (1942). Printing and the Growth of a Protestant Movement in Germany from 1517 to 1524. Church History, 11 (2), 123-137.

by apmcginn at January 28, 2013 08:51 PM

January 24, 2013

Catholic Portal

Google Analytics and the Catholic Portal

Through my experimentation with Google Analytics, it has proven to be a rather useful tool for tracking usage patterns for the Catholic Portal. However, it is only effective if one knows where to look for information. To that end, I have compiled a quick guide about where to find the answers to all sorts of questions about website usage. While I use the Catholic Portal for these examples, the instructions here should be applicable for any site set up to use Google Analytics.

 

Note: Images in this article can be clicked on to get a full-size image

 

Overview of where to find user information:

 

Peak periods of use and hits per day/month/year:

This can be found in the Visitors Overview screen, under the Audience section. On this screen, there is a line graph of unique visits for a given time period (by default, it shows visits per day for a one month period). The time unit for the x-axis (hour, day, week, or month) can be adjusted with the buttons in the top-right corner immediately above the graph, and the total amount of time shown in the graph can be adjusted using the date settings drop-down menu above the time unit buttons. Hits recorded by Google Analytics do not include bots or crawlers, as they are unlikely to trigger the JavaScript code that Google Analytics uses to record usage statistics (alternatively, Google may just automatically exclude counting bots through its own methods).

 

Visitors Overview screen

To find the peak periods of use for the Catholic Portal from August 1 to October 31, the first thing to do is to set the desired period of time by clicking the calendar settings in the top-right corner and selecting the desired date range.

Date range selection

After selecting the desired date range, the graph can be set up to use hours, days, weeks, or months as units. For this example, the default (days) will be used.

Highest usage (mid-week)

Lowest usage (weekends)

Overall, the periods of highest use tended to be during the middle of the week, with peaks tending to occur on Tuesdays or Wednesdays. Usage dropped to its lowest point during the weekends.

Search strings entered:

There are two sources on Google Analytics for strings entered by users for searches. The first is in the Traffic Sources section overview, which shows keywords that are entered in the search engines (Google, Bing, etc.) which referred those users to the Catholic Portal. The other source is in the Site Search section of the Content section. Instead of giving keywords from external search engines, this gives the keywords entered in a search box on the website itself. However, Site Search tracking must be configured to track a particular search box. In the case of the Catholic Portal, Site Search is set up to track entries in the search box on the Catholic Portal’s main page.

Both types of sources for strings entered are shown in the pictures below. The top picture displays the list of top keywords in the Traffic Sources section from August 1, 2012 to October 31, 2012. The bottom picture, on the other hand, displays the top searches entered using the Catholic Portal’s search feature from August 1, 2012 to October 31, 2012.

 

Traffic Sources (strings entered on search engines)

Site Search (strings entered using a web site’s search features)

Use of Web 2.0 features:

The use of Web 2.0 features (e-mail, cite, etc.) can be tracked by checking for hits to their corresponding links. For the use of the “cite” option, for example, the pattern for such links is “http://www.catholicresearch.net/Record/[record ID]/Cite”. E-mail, export, and text use the same pattern, with “Cite” replaced by “Email”, “Export”, and “SMS”, respectively. To find the number of hits for, say, email links, first go to Site Content (under the Content section) and then All Pages. In the search box above the list of links, enter “/Email” (without quotes), and a list of hits to the email links will be produced. Be sure to enter the slash, as false positives can be returned without it (in particular, “SMS” without the slash will return results with “Catechisms” in the string.

 

Use of the “Email” feature on the Catholic Portal from Aug. 1 2012 – Oct. 31, 2012

Use of the “Cite” feature on the Catholic Portal from Aug. 1 2012 – Oct. 31, 2012

Use of the “SMS” feature on the Catholic Portal from Aug. 1 2012 – Oct. 31, 2012

Use of the “Export” feature on the Catholic Portal from Aug. 1 2012 – Oct. 31, 2012

For the period between August 1, 2012 and October 31, 2012, there was very little usage of the Web 2.0 features.

Field searches vs. general searches:

Finding statistics on the use of field or general searches is dependent on how the searching method for the site is set up. For the Catholic Portal, the type of search is indicated in the URL with “type=[Search Type].” For example, general searches are indicated with “type=AllFields” and field searches can have “tag,” “ISN,” “CallNumber,” “Subject,” “Author,” or “Title” in place of “AllFields.” To find the number of searches for a given type of search, go to Site Content (under the Content section) and then All Pages. In the search box above the list of links, enter “type=[Search Type]” (without quotes, where [Search Type] is the value of whichever search type for which you want to find statistics), and a list of searches performed for that given type will be produced. The ability to do this for the Catholic Portal is only possible because the search result pages have individual URLs due to the way search is set up on the site. For web sites which do not have URLs for each search result page, or which do not indicate the type of search in the URL, tracking such statistics may not be possible without another form of tracking (e.g., server-side scripts).

 

“General” (All fields) search results

“Subject” search results

“Author” search results

“Title” search results

“Call Number” search results

“ISN” search results

“Tag” search results

For the period between August 1, 2012 and October 31, 2012, the use of general and subject searches (2,361 and 2,329 hits respectively) greatly outnumbered the use of author (84 hits), title (286 hits), ISN (6 hits), call number (13 hits), and tag searches (5 hits).

Use of the tabs at the top:

For the Catholic Portal, these pages are under the About directory (e.g., “http://www.catholicresearch.net/About/Council” is the account login page). To find the number of hits for these pages, first go to Site Content (under the Content section) and then All Pages. In the search box above the list of links, enter “/About/” (without quotes), and a list of hits to the tabs at the top will be produced. For any site which is set up to use Google Analytics, the usage statistics of pages under a particular sub-directory can be found with this search box.

Use of the tabs at the top from August 1, 2012 to October 31, 2012

User account usage information:

While this cannot be determined directly via Google Analytics, a reasonable guess can be made for account usage based on how often the pages associated with user account management are accessed. For the Catholic Portal, these pages are under the MyResearch directory (e.g., “http://www.catholicresearch.net/MyResearch/Home” is the account login page). To find the number of hits for these pages, first go to Site Content (under the Content section) and then All Pages. In the search box above the list of links, enter “/MyResearch” (without quotes), and a list of hits to the tabs at the top will be produced.

Use of pages related to user accounts on the Catholic Portal from August 1, 2012 to October 31, 2012

 

Based on the low usage of pages under the “MyResearch” directory, it is likely that user account usage on the Catholic Portal was low for the period between August 1, 2012 and October 31, 2012.

Language and Country/territory:

The language and location of users and be found in Google Analytics, under Audience / Overview or Audience / Demographics, and is based on host and IP data.
For the Catholic Portal, most visitors are from the United States, with English as the most commonly reported language. For the period between August 1, 2012 and October 31, 2012, Italy comes in second place, and there are a few other countries with significant Catholic populations in the top ten results such as the Philippines, Poland, and Spain.

Country of origin for visitors from August 1, 2012 to October 31, 2012

Reported language/locale for visitors from August 1, 2012 to October 31, 2012

 

Browser, Operating System, and ISP:

The browser, OS, and ISP of the users can be found through different options under the Audience tab. The users’ browser and OS can be found under Browser & OS, which is in the Technology subsection of Audience, and the ISP can be found under Network, which is also in the Technology subsection.
In the case of the Catholic Portal about 74% of visitors use Windows, 15.8% use Macintosh, and the remainder is divided between Linux and a variety of mobile operating systems (e.g., iOS, Android, Blackberry). As for browsers, the usage is divided more evenly, with Internet Explorer at 34.3%, Firefox at 25.9%, Chrome at 19.1%, and Safari at 16.5%.

Operating Systems used by visitors to the Catholic Portal from August 1, 2012 to October 31, 2012

 

Browsers used by visitors to the Catholic Portal from August 1, 2012 to October 31, 2012

 

Reported ISPs of visitors to the Catholic Portal from August 1, 2012 to October 31, 2012

 

Search engine referral/where users come from:

The referrer can give an indication of how and possibly why users come to the site. This information can be found in the Overview section of the Traffic Sources tab in the left side bar. In the Overview screen, the sources are broken down into Search Traffic (users who arrived from a search engine such as Google or Yahoo), Direct Traffic (users who arrived from clicking a bookmark or from typing in a URL), and Referral Traffic (users who arrived from clicking a link on a site other than a search engine). The Sources section provides further information on categories, dividing possible sources into All Traffic, Direct, and Referrals. The Search section provides information on results from search engines which brought users to the site. Note: This is not to be confused with the Site Search section (under Content), which provides information on searches performed on the site (e.g., a search box on the main page) rather than search engines. Site Search tracking, additionally, has to be set up to track use of whichever search box for which statistics are desired.

 

Overview of traffic sources for the Catholic Portal from August 1, 2012 to October 31, 2012

Listing of all traffic sources for the Catholic Portal from August 1, 2012 to October 31, 2012

Listing of direct traffic sources for the Catholic Portal from August 1, 2012 to October 31, 2012

 

Listing of all non-search engine referrers for the Catholic Portal from August 1, 2012 to October 31, 2012

 

Most visitors seem to reach the Catholic Portal through search engines, and those that do not tend to reach the site through links from Catholic universities (e.g., University of Notre Dame, Georgetown, SHU) or from library-related websites such as vufind.org or cathla.org.

User activity:

The activity pattern of users can be determined in the Behavior sub-section of the Audience tab. This section contains a number of statistics regarding the duration and frequency of user activity. Among these statistics are:

Bounce rate – The bounce rate is the percentage of visitors who come to one page on the website, and then leave to some other website within a short time. This can be viewed within the Overview section under the Audience tab, or alternatively under the Engagement subsection of Behavior (under Audience). This section has two measured dimensions that can be selected: “Visit Duration” and “Page Depth.” The bounce rate cited in the Overview section is equal to percentage of visits with a Page Depth of one.

 

Page depth of visitors to the Catholic Portal between August 1, 2012 and October 31, 2012

Duration of visits to the Catholic Portal between August 1, 2012 and October 31, 2012

 

Frequency and recency — Statistics on frequency (number of times visited by the same visitor) and recency (amount of time between visits by a repeat user) can be viewed under the Frequency & Recency subsection of Behavior (under Audience). The graph on this screen can be set up to view the numbers of visitors who visit a particular number of times (frequency) or the number of visitors who waited a certain number of days before a subsequent visit (recency).

Number of visits to the Catholic Portal from individual IP addresses between August 1, 2012 and October 31, 2012

Overall, visitors to the Catholic Portal tend not to be repeat visitors, and they do not stay at the site for very long.

Error messages:

Google Analytics does not seem to record any information about client-side or server-side errors. The only error related information I found was on Analog (a set of server-side scripts for tracking Catholic Portal statistics), which records the frequency that HTTP status codes get returned, which includes errors such as 404 (Not Found), 403 (Forbidden), and 500 (Internal Server Error). Unfortunately, more detailed information (e.g., which particular user requests give errors, whether certain errors correlate with a particular time) would require logs of individual requests, which neither Analog nor Google Analytics provide. One way that Google Analytics could be set up to track errors is if a given site is programmed so that an error causes a redirect to a custom error page (for example, some websites will redirect the user to a customized 404 or 500 error pages when the server encounters an error). If a site is set up in this fashion, the number of hits to the custom error pages could provide some indication of the frequency of various errors. This might not, however, be possible for all types of errors.

Dashboards

Google Analytics allows setting up widgets on the Dashboard, so that you may have a convenient, customized set of data on one page. To do so, first click on the Home tab on the top of the Google Analytics page. Then, click on Dashboards, which is on the left side. Under the Dashboards link, click + New Dashboards to create a new dashboard with whatever title or style you like. “Starter” will automatically fill the dashboard with a set of widgets. Widgets can be deleted, added, or edited with either the “Blank” or “Starter” option. To edit a widget, click on the gear icon on the top-right corner of any widget. Then, you can customize its presentation style (graph, table, etc.), metric (Visits, Page views, Visit Duration, etc.), and other options. You can save these options with the Save button on the bottom-left. To add a widget, click + Add Widget, which will bring up the same configuration screen as when you edit a widget. To delete a widget, click the gear icon by any widget to bring up the edit screen. Then, click “Delete Widget” on the bottom-right corner of this screen.

Example Dashboard for the Catholic Portal

Note: Dashboards are unique to whichever user creates them, so only that user may view or edit. However, they can be shared with other Google Analytics users who have an account associated with the same page (in this case, the Catholic Portal) using the Share Dashboard link in the Dashboards screen. This will give a link that can be sent to another user, so that he or she may import a Dashboard with the same set of widgets. The new dashboard would be a copy, so changes that the recipient makes to that dashboard would not appear on the sender’s dashboard, or vice versa. Additionally, you can create a PDF file with the widgets you have created using the Export tab.

Conclusions

While it is not an all-encompassing solution to all types of usage analysis that you may want to do, Google Analytics provides plenty options in an easy to use fashion such that you can find a lot of useful information about how users are using your site. It is important, however, to keep in mind the limitations of Google Analytics and to be familiar with how your site is set up and organized to be able to make best use of the service.

by apmcginn at January 24, 2013 08:53 PM

Date created: 2000-05-19
Date updated: 2011-05-03
URL: http://infomotions.com/