• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "New version of SKA - v0.1.4"

Collapse

  • threaded
    replied
    AtW: I'm impressed, and I ain't easily impressed.

    Leave a comment:


  • Ardesco
    replied
    Sort it out AtW not all my websites are on it yet

    Leave a comment:


  • AtW
    replied
    Originally posted by The Doctor
    Search returned is for a single letter 'c', the link breaks on the hash for all recent searches.
    Yep its a bug -- no URL encoding used when showing recent searches, #'bit makes browser think its anchor target rather than symbol for searching. Try typing c# directly into search box it works pretty well. I am quiet chaffed at it because the main issue was to support these non-alpha numerics without slowing down indexing much, which was achieved.

    Leave a comment:


  • The Doctor
    replied
    Click on the 'recent searches' link
    Click on 'c# for dummies' link
    Search returned is for a single letter 'c', the link breaks on the hash for all recent searches.

    Leave a comment:


  • AtW
    replied
    Originally posted by xoggoth
    I typed in xoggoth and STILL ONLY GOT ONE ENTRY, you useless slacker!
    Nobody else takes this nick due to the risk of being confused with you

    There, now there are two pages in it -- use the submission tool Luke

    Leave a comment:


  • xoggoth
    replied
    I typed in xoggoth and STILL ONLY GOT ONE ENTRY, you useless slacker!

    Leave a comment:


  • AtW
    replied
    Originally posted by Funky
    I see what you mean there is only slight and subtle differences between all the main sites. But you have to stand out from the crowd visually. not just for the user searching but so it can be recognised on a screen across an office.
    This is a good idea -- I am certainly going to put some time into visual changes, probably make it skinnable.

    Leave a comment:


  • Funky
    replied
    Originally posted by AtW
    Its a good point even though the layout is pretty much the same used by all major search engines.
    I see what you mean there is only slight and subtle differences between all the main sites. But you have to stand out from the crowd visually. not just for the user searching but so it can be recognised on a screen across an office. So for example changing the colours used from green to red the same as the logo (well a darker red so it can be seen clearly with smaller fonts).

    Leave a comment:


  • AtW
    replied
    Its a good point even though the layout is pretty much the same used by all major search engines.

    Originally posted by Funky
    The first search I did returned quickly but the first link in the results was a dead link.
    It is possible, though fairly rare -- currently search engine is in early stages and it does not recrawl existing pages frequently yet: we are currently in data acquisition mode.

    Your idea with the link that checks when clicks and flags for removal is good one, I was going to do it anyway, but thanks for reminding about its importance

    Dead links in many respects shown due to poor relevance -- good relevance algo will show page that is linked to a lot, and normally these pages are on well maintained servers, thus problem of dead links goes away somewhat.

    Leave a comment:


  • Funky
    replied
    Just had a quick play. When I saw the results page all I could think of was Google results. From the grey bar with the results and time taken right aligned to the blue link and green text for the URL.

    Googles bar is actually a light blue but as the layout is the same. User will instinctively make that connection and assume your results are actually from Google. I know as a developer that you would rather be coding but it would be a good idea to play with the look and feel of the results and make it distinctive and stand out in the crowd.



    The first search I did returned quickly but the first link in the results was a dead link. As the face of the internet is continuously changing and evolving sites and pages come and go. It is inevitable you will serve up a dead link and it’s not efficient to check these when returning the results. One of many possible solutions I can think of is instead of having a direct link to the index site page. Send the selected link back to your own site as a URL param. You are then able to check the page exists and redirect the response object as normal. If the page does not exist but the site does redirect them to the main page. If the site is gone let the user know and allow them to select another url.

    These dead links can then be stored and used to filter down the results of subsequent searches. And schedule a bot to interrogate the site over a period to try and resolve if the site was just down at the time. If it still has not come back after a time then remove it from your index. One user will experience the dead link but everyone after will benefit from it. This will also help maintain your index database.

    Leave a comment:


  • AtW
    replied
    Originally posted by MrsGoof
    Just searched on "sun c#"
    Hmmm, works fine right now

    Leave a comment:


  • MrsGoof
    replied
    OOOppps

    "Error occured:

    System.Exception: WordID from lexicon is not the same as in the inverted file! Details: LexWordID=14332,InvIdxWordID=3068
    at Majestic12.Search.Find(String sKeywords, Int32 iFrom, Boolean bNoInfo) in h:\alex\projects\mj12searchlib\search.cs:line 548
    at Majestic12.Search.Find(String sKeywords, Int32 iFrom) in h:\alex\projects\mj12searchlib\search.cs:line 471
    at Majestic12.WebSearchInterface.ExecSearch(StringBui lder oSB, WebRequest oWebReq, String[] asParams) in h:\alex\projects\mj12search\websearchinterface.cs: line 286"

    Just searched on "sun c#"

    Leave a comment:


  • AtW
    replied
    Thanks dude, current search engine is 6 weeks effort -- most of time went into building crawler and other bits. Ask Jeeves are crap, I certainly hope to surpass their quality this year.

    1) Agreed -- geo-targeting is coming in about 6 weeks time, its just currently not top priority

    2) Current weights are just quick guesses -- I am going to expose editable formulae for people to play with these, major push for relevancy will start once enough data is indexed -- need well sized collection to have a decent chance of getting relevant results

    4) Meta tag text is sometimes more descriptive -- its a well written sentense that google is using pretty often, I have just implemented having description in the first place, need better strategy as to when to show it and when not.

    A number of results without text snippet are due to the fact that they are just discovered links -- if you search for ".net framework download" you will see links to download page even though they were not indexed. Its pretty powerful concept that seem to work, not fine tuned just yet, but it will be.

    5) site: clustering is not supported yet -- this will come next month, or might be before end of this month. I agree that it is necessary because its implementation will allow to limit number of matches per domain on a single page -- at the moment it causes serious problem with with Bristol City

    Would have done a bit more but it seems to have fallen over - I guess you must be working on it
    Yeah I was restarting it - thanks for giving it a whizz!

    Leave a comment:


  • ferret
    replied
    Nice

    I am impressed! One day you might be as good as Ask Jeeves Not a cuss as they are worth a few quid even though they are tulip.

    1 - Look forward to seeing the next indexing! Would it not be more sensible to do as Google does and try and classify sites into a country either by domain or by geographic location? This way different stops could be run on different collections therefore bypassing this problem.

    2 - Maybe you are placing too much weighting on link text then! Then again your collection is releatively small but even with this number there must be more relevant results. Search engines need to have what then end user is searching for. I would have expected to get the results Google gives me - the city council and then other bristol sites, most of them football related! Yours just has lots of council links when you start digging, this is quite a problem with your search engine, for any search one or two results from any domain only should be displayed. If the result is not what the user is looking for they need to expand their query.

    3 - Giving away free advice here, very unlike me

    4 - I understand what you mean and I know Google shows meta tag sometimes but why bother replacing it if you have text on the page anyway? If you are simply using the meta description to serve some sort of description if there is no text then that is cool.

    Quite a few of the results show no text snippet thougha nd this is a bad thing as it will not tempt searchers to click on the link to it.

    5 - Tried doing a site:www.bbc.co.uk search and it brought back loads form the bbc, did site:www.bbc.co.uk buiscuits and it brought back some pages from the bbc and then a load of pages not from the bbc.

    If you could build this operator in I would be impressed, allowing users to search a particular domain for a word or phrase is very handy. All the major search engines have this feature.

    Would have done a bit more but it seems to have fallen over - I guess you must be working on it

    Leave a comment:


  • AtW
    replied
    Right, answering one after another:

    1) IR35 - it did not work well because "IR" was judged to be a stop word, thus search actually run for "35". Now "IR" was incorrectly judged as stop word because its stop word in one of the foreign languages and it slipped between my fingers that it has useful meaning in English. Solution: list of stop words ammended and will be in force after next indexing phase.

    2) Bristol City - the nearest match related to BC FC is on page 7, the reason its so far away is because at the moment there is no domain clustering. As the result Bristol City council have loads matches. Clustering will limit that to 2 per domain, thus giving reasonable chance for BC FC to appear. Bear in mind -- it only has 30 mln URLS indexed, entirely possible official page of BC FC is not in

    Millwall FC is #1 match because it appears that words "Bristol City" were used in reference to that URL

    3) QA - you are doing great job, don't spoil it by asking for money


    4) META tags: don't confuse INDEXING of meta tags and showing META description in search listings. I don't index meta tags because too much spam, however it was brought ot my attention in last release that Google and others use META description for sites that have it. Chunks of text with relevant keywords are also shown, just like Google.

    Leave a comment:

Working...
X