Selecting every 100th of 1,000,000 rows

Selecting every 100th of 1,000,000 rows

How can I select every 100th record for a table of 1,000,000 rows? For example, the table consists of a distinct memberid, sex, and age, and I only want every 100th member from this table.

    Requires Free Membership to View

    By submitting your registration information to SearchOracle.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchOracle.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

There's really no perfect way to do this with just SQL. The best solution is to use an extension to SQL like Transact-SQL (Sybase and Microsoft SQL/Server) or PL/SQL (Oracle). These language extensions allow you to write processing logic to cursor through the table, fetching every 100th row using a loop counter. This is not only the most accurate method but the most efficient as well.

If your table's memberid is an auto_increment or identity or sequence column, and if it has relatively few gaps in the numeric sequence, then you can come close to what you want with the following query --

 select memberid, sex, age from yourtable where mod(memberid+27,100) = 0

Here mod() is the modulus function, which returns a remainder upon division. Check your database for the specific function name. I added 27 to the memberid simply to introduce some randomness. The query will select memberids 73, 173, 273, and so on, if they exist. Out of a million rows, this should yield ten thousand, fewer if there are gaps.

You see, a lot depends on what you want these rows for. If you need to do a strict statistical calculation, and need exactly every 100th row, you would want the first solution. If you're just interested in getting approximately one out of every hundred rows, more or less, for testing purposes, then the modulus approach will usually suffice.

Finally, a recommendation I made in a previous answer, Selecting random rows (29 May 2001):

The best solution, for situations where you only need a small number of rows, such as in testing, is to have a sample table which contains not random rows of live data, but test cases -- in other words, a place where you can collect those rows which are of interest or which represent conditions you want to test for.


This was first published in September 2002

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.