What is SoDakLIVE?
What is the benefit to agencies that participate in SoDakLIVE?
Using SoDakLIVE to search for information
How SoDakLIVE Works
SoDakLIVE Process
What Tools and Assistance are Available to Agencies?
Description of Metadata
Metadata and Dublin Core
Getting Started with Indexing
Placing SoDakLive Attributes in HTML coding
SoDakLIVE Spidering
Keeping your pages out of SoDakLIVE!
Additional Services and Reports available to South Dakota State Agencies
South Dakota LIVE is:
What is the benefit to agencies that participate in SoDakLIVE? Back to top
Using SoDakLIVE to search for information Back to top
The State Library provides the SoDakLIVE database and search engine 24 hours each day. The public accesses this tool using an Internet connection and a browser (e.g., Netscape or Internet Explorer). In addition to personal or business-owned computer access, many public libraries have computers with Internet connections for public use.
Once connected to the SoDakLIVE site, a searcher can locate both Web information and state documents that have been scanned and added to the database as PDF files. Searches allow general term queries ("keyword" searching) or more specific, fielded attribute searching.
Search results are displayed in lists in relevant order. Access to the source document is through a link to an agency's Web site or directly to the PDF document.
How SoDakLIVE Works Back to top
All government agencies should:
What Tools and Assistance are Available to Agencies? Back to top
State and local agencies can turn to the State Library for assistance in the indexing process.A Description of Metadata Back to top
Metadata is data that describes information or data associated with an object that describes it. As metadata is stored within the area of a Web page, it is not visible to Web browsers. This embedding in the page also allows the metadata to remain current when the document is created, moved or updated.The ability to search and find information is enhanced by controlled vocabularies linked to the metadata elements. In addition, as metadata is combined with controlled subject indexes, it allows more precise searching and document management.
Why use metadata in the first place? Metadata gives focus to searching. Imagine a database where every record has only one field. You can find records, but records can't be grouped by unique attributes, because there aren't any. You can use the HTML meta tag to specify the summary text that will appear in a search results list and to control if and how your page is indexed by the search engine. The meta tags must be placed within the HEAD portion of your web page. Do not use any HTML tags within the meta tag itself. In the absence of metadata, the search engine will index all the words in a document except for comments, and will use the first few words as a summary to describe your page in the search results.The SoDakLIVE online document index is like a fielded database, and the fields consist of the SoDakLIVE attribute set elements. Any given document has a number of attributes that make it unique and could potentially help others to find it. We have attempted to capture those unique elements.
Suppose your page contains:<meta name="author" content="Secretary of State ">
The search engine will index this page so the that it can be found with any of the queries:author:Secretary
author:"Secretary of State"author:"secretary of state"
Metadata and the Dublin Core Elements Back to top
The SoDakLIVE Project has adopted a 13-element metadata set, which is modified Dublin Core for describing network-accessible materials. This core set of metadata elements is defined by the Dublin Core Working Group. Endorsed by the W3 Consortium in 1998, it has recently been approved as National Information Standards Institute (NISO) standard number Z39.85, and sent on for American National Standards Institute (ANSI) approval.As the collection of electronic documents on South Dakota Agency Internets (and intranets) grows, metadata is emerging as a powerful tool to find useful information. Placed in Web pages, metadata will allow information to be found more easily and accurately. It also can be used for records management and archiving. Metadata helps search accuracy on commercial search engines, too. Finally, SoDakLIVE - DC allows standards to be developed that relate to electronic document cataloging, retrieval or archiving using Dublin Core Elements.
There are several reasons for adopting Dublin Core for SoDakLIVE:Getting Started with Indexing Back to top
Introduction: The SoDakLIVE metadata attribute set lists the "core elements" of content indexing adopted as an indexing standard by the South Dakota State Library. These thirteen elements are adopted as the minimum number of indexed metadata fields for the purpose of describing information content in government produced information sources. Six attributes are mandatory and seven are optional. The thirteen elements are listed with definitions and examples are provided. The field definitions may be used as reference when creating meta tagged fields in the header of a web page. When these meta tags are present in the header of a web page, the SoDakLIVE server will index the site and include links to the information in the SoDakLIVE database.Placing SoDakLIVE Attributes in HTML coding Back to top
It is important to use the following conventions when placing your index information in the Web page or document.Using a text editor, type (or paste from the template) the SoDakLIVE attributes (now called METATAGS) in the <HEAD> section of the Web page or document:
<META NAME="attribute term"CONTENT=" value ">Using several SoDakLIVE Attributes for examples:
<META NAME="originatorJurisdiction" CONTENT="State of South Dakota"><META NAME="keywords" CONTENT="small pox chicken pox mumps">
<META NAME="subjects" CONTENT="Health and Medicine; Public health; Immunizations">(Note: subject values are from the controlled terms of the SoDakLIVE Subject Tree)
<META NAME="Agency Program" CONTENT="Digitization Project">SoDakLIVE Spidering Back to top
Keeping Your Pages Out of SoDakLive Back to top
There may be some pages that you don't want to be part of our system. For instance, if your files are in one of the major state servers like state.sd.us, chances are your files have been spidered and indexed into our system anyway. How do you keep your pages from being spidered?First of all, to see which pages are spidered and which aren't, it's useful to understand how the SoDakLIVE robot collects pages. It begins by fetching a starting point page, typically the server's root page. It collects all the links on that page, then visits each of those pages. If the page meets our criteria, then that page is scanned for links, and those links are visited, and so on. This means that, for a page to be collected into our system, it must ultimately be navigable from the root, starting point page. If nothing links to a page, it won't be found by our spider -- nor by any other spider.
One way to keep spiders from your pages is by adding and configuring a robots.txt document in your server's root.However, you can also add meta tags that will block robot access. To quote from the Netscape Compass Server documentation, here's an explanation.
The META tag that controls robot behavior uses the name ROBOTS. Its content tells a visiting robot whether it should include the document itself in its index and whether to follow hyperlinks found in the document to index the linked documents.The general format for the ROBOTS tag is as follows:
<META NAME="ROBOTS" CONTENT="terms">The terms in the CONTENT portion can be any of the following, separated by commas:
| Content String | Meaning |
|---|---|
| ALL | The robot is welcome to include this document in its index and to follow any links found in it. This is the default value. You can get the same result by leaving the CONTENT portion empty, by omitting the ROBOTS tag entirely, or by using the contents "INDEX, FOLLOW". |
| NONE | The robot should ignore the page. This is the equivalent of "NOINDEX, NOFOLLOW". |
| INDEX | The robot is welcome to include the document in its index for searching. |
| NOINDEX | The robot should not include the document in its index. The robot can still follow links, unless you also include the NOFOLLOW string. |
| FOLLOW | The robot is welcome to follow any hyperlinks in the document to locate other documents for its index. |
| NOFOLLOW | The robot should not follow any hyperlinks
in the document to locate other documents. This enables you to index just
the entry point of a complex document, for example, or to index the open access point to an otherwise restricted site. |
Additional Services and Reports available to South Dakota State Agencies Back to top
All government agencies that participate in SoDakLIVE are entitled to additional helpful services and reports. The following benefits are available: