OSU Search is powered by a Google Search Appliance. One of issues we’ve had to overcome from day one is the relevance of search results. One of the main criteria for search result relevance is how many pages link back to a page to figure out how relevant a search result is. This is one of the areas where OSU Search can’t keep up with external search engines like Bing, Yahoo or Google because OSU Search crawls, and is only aware of, OSU related websites.
In other words, if a site is being linked to by many external websites or groups this information is not used by OSU Search to improve results.
The good news is that the Google Search Appliance has a feature called Self Scorer. With this functionality turned on, the search appliance can improve the search results relevance by observing which links the users click on after they do a search. We had this feature turned on, but since we don’t use the search appliance directly, we weren’t taking advantage of it. In the latest version of OSU Search, we ported this feature over. Now, whenever you do a search in search.oregonstate.edu, the search appliance will make a note of what search result you clicked on and if enough people click that search result, it will move it up the list. This should make a difference in the relevance of search results end users see.
Another advantage of having the Self Scorer enabled is that we can run advanced search reports. What this means is that we’ll now be able to get reports that tell us things such as:
- The ranking of the search results that people are clicking on, or
- How often people use the next/prev links to find what they’re looking for instead of finding it on the first page
This extra data will allow us learn how useful the information that OSU Search is for different types of search queries, so that we can improve them.
One of the issues that had plagued EvalS (an evaluation performance application/portlet) from the beginning was a performance issue. EvalS was the first jsr 286 that we wrote for the Luminis portal. During the first several releases we worked hard to improve the performance by reducing the # of queries and caching whenever possible. In the past, whenever a person would first load the portal page containing EvalS it would take about 5-6 seconds for it to finish loading the page.
This EvalS performance defect affected all users, only after their initial login. This type of performance was not something we were proud of, so over time we worked on improving the code base, and performance of the backend code. A few months ago, we dedicated some resources to finally fix the problem once and for all.
Our initial assumptions were that the EvalS specific code was slow due to it not being optimized for the number of employees and jobs at OSU. This assumption proved to be incorrect once our development environment included enough random data to match the amount of records in production. After a careful analysis of EvalS and the differences between production & development, a small piece of code external to EvalS, but which EvalS relied on was the identified as the culprit.
When a person first accesses EvalS, the application needs to figure out the ONID username of the person. It was this piece of code causing the problem and slowing down the application for the person when they first logged into the application. We never expected this piece of code to be a problem, that’s why we didn’t look into it at first.
The Luminis portal doesn’t store the ONID’s username in the User_ table of the portal. Instead it uses a random # and stores it in the “screenName” column. This is the column where the ONID username would usually be stored. We use an sql query to translate between the random Luminis # assigned to each user and their ONID username. One of the joins this query was using didn’t contain the necessary indexes. This was making the query slow.
The fix was rather simple once the culprit was identified. The owner of the external query created a new table that we queried instead. This table contained the necessary data along with needed indexes. EvalS now queries this table and the speed has improved drastically.
We should have challenged our assumptions when we were troubleshooting this performance issue, but we have learned some valuable lessons from our mistakes, which will be helpful in the future. In current and future projects, we now test & analyze the performance of the application early during the development stages. Our development environment now includes enough random data to match the amount of data in production and allow for growth. Moving forward in this way allows us to demystify application behaviors.
I’m glad to be writing about some exciting updates to OSU Search. In version 0.4.2, you will find an updated look and feel and some usability features. Among the new things you will find are:
- The links to different types of search (collections) are located on a left sidebar instead of above the search box.
- Filter by url – Users can now filter by urls by clicking the domains that we currently crawl located in the left sidebar.
- Header and footer updates – they now include the same content as the homepage
- The search box has moved down closer to the results area
- Faster results!
We think these additions to OSU Search will make finding what you’re looking for easier. We will keep bringing new features to OSU Search to help users explore more advanced search features they may not be aware of. Some of the future improvements will include: people search, location search and speed improvements. Our goal is to let people just type what they are looking for without having to worry about what filters to use or what options they need to select. OSU Search should be doing all the heavy lifting for users.
If you have any questions or comments, feel free to post below.